inferencerlabs
/

openai-gpt-oss-20b-MLX-6.5bit

Text Generation

Model card Files Files and versions

openai-gpt-oss-20b-MLX-6.5bit / README.md

inferencerlabs's picture

Upload complete model

5e7d07d verified 10 days ago

|

history blame contribute delete

934 Bytes

	---
	license: apache-2.0
	pipeline_tag: text-generation
	library_name: mlx
	tags:
	- vllm
	- mlx
	base_model: openai/gpt-oss-20b
	---
	See gpt-oss-20b 6.5bit MLX in action - [demonstration video](https://youtu.be/mlpFG8e_fLw)

	q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8.
	\| Quantization \| Perplexity \|
	\|:------------:\|:----------:\|
	\| q2 \| 41.293 \|
	\| q3 \| 1.900 \|
	\| q4 \| 1.168 \|
	\| q6 \| 1.128 \|
	\| q8 \| 1.128 \|

	## Usage Notes

	* Tested to run with [Inferencer app](https://inferencer.com)
	* Memory usage: ~17 GB (down from ~46GB required by native MXFP4 format)
	* Expect ~100 tokens/s
	* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
	* For more details see [demonstration video](https://youtu.be/mlpFG8e_fLw) or visit [OpenAI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b).