amd
/

Mixtral-8x7B-Instruct-v0.1_FP8_MLPerf_V3

Model card Files Files and versions

Mixtral-8x7B-Instruct-v0.1_FP8_MLPerf_V3 / README.md

linzhao-amd's picture

Update README.md

8fd5959 verified about 1 month ago

|

history blame contribute delete

3.11 kB

	---
	license: apache-2.0
	metrics:
	- accuracy
	base_model:
	- mistralai/Mixtral-8x7B-Instruct-v0.1
	---
	# Quark Team FP8 Mixtral-8x7B Model Overview

	## Model Information For MLPerf
	- Model Name: Mixtral-7x8b
	- Version: MLPerf v5.1
	- Commit: Close Division Commit
	- Supported Hardware Microarchitecture: AMD MI300/MI325
	- ROCm: 6.4.1
	- Operating System(s): Linux
	- Transformers: 4.46.3
	- Quark: [0.9](https://quark.docs.amd.com/latest/install.html)

	## Calibration Dataset
	This model was built with mistralai Mixtral model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
	The calibration dataset consists of 1024 mixed datasets provided by [mlcommons/inference](https://github.com/mlcommons/inference/tree/master/language/mixtral-8x7b#get-dataset), which includes:
	- 325 GSM8k samples
	- 325 MBXP samples
	- 374 OpenOcra samples

	## Quantized Tensors
	The following tensors are quantized in each decoder:
	- Expert MLP Inputs and Weights (excluding the router)
	- Linear qkv Inputs and Weight
	- KV Cache Entries

	## Ignored Layers
	The following layers are ignored during quantization:
	- `*.gate`
	- `*.o_proj`
	- `lm_head`

	## Algorithms
	AutoSmoothQuant algorithm is applied in weight-activation quantization for better performance.

	## Quantization Scripts
	```
	cd examples/torch/language_modeling/llm_ptq/
	MODEL_DIR="mistralai/Mixtral-8x7B-Instruct-v0.1"
	DATASET="./mlperf_data/mixtral_8x7b%2F2024.06.06_mixtral_15k_calibration_v4.pkl"
	OUTPUT_DIR="amd/Mixtral-8x7B-Instruct-v0.1_FP8_MLPerf_V3"

	python3 quantize_quark.py --model_dir "${MODEL}" \
	--output_dir "${OUTPUT_DIR}" \
	--dataset "${DATASET}" \
	--data_type float16 \
	--multi_gpu \
	--quant_scheme w_fp8_a_fp8 \
	--kv_cache_dtype fp8 \
	--num_calib_data 1024 \
	--seq_len 1024 \
	--min_kv_scale 1.0 \
	--model_export hf_format \
	--custom_mode fp8 \
	--quant_algo autosmoothquant \
	--exclude_layers "lm_head" ".gate" ".o_proj"
	```

	# Model Performance Comparison

	\| Metric \| Baseline Accuracy Target (%) \| FP8 Quant Accuracy (%) \|
	\|-----------------------\|--------------------\|-----------------------\|
	\| GSM8K (Math) \| 73.66 \| 73.18 (99.34%) \|
	\| Open Orca (Chat) \| \| \|
	\| - Rouge1 \| 45.5989 \| 45.4362 (99.64%) \|
	\| - Rouge2 \| 23.3526 \| 23.168 (99.21%) \|
	\| - RougeL \| 30.4608 \| 30.2922 (99.45%) \|
	\| MBXP (Code) \| 60.16 \| 60.08 (99.87%) \|

	# License
	Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.