File size: 937 Bytes
c3620d0
 
 
 
 
 
 
 
 
0382ddb
c3620d0
eb0f548
c3620d0
 
 
 
 
 
 
 
 
 
6515609
fa92bd9
c3620d0
6515609
0382ddb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
---
license: apache-2.0
pipeline_tag: text-generation
library_name: mlx
tags:
- vllm
- mlx
base_model: openai/gpt-oss-120b
---
**See gpt-oss-120b 6.5bit MLX in action - [demonstration video](https://youtu.be/mlpFG8e_fLw)**

*q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8.*
| Quantization | Perplexity |
|:------------:|:----------:|
| **q2**       | 41.293     |
| **q3**       | 1.900      |
| **q4**       | 1.168      |
| **q6**       | 1.128      |
| **q8**       | 1.128      |

## Usage Notes

* Tested to run with [Inferencer app](https://inferencer.com)
* Memory usage: ~95 GB (down from ~251GB required by native MXFP4 format)
* Expect ~60 tokens/s
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.26
* For more details see [demonstration video](https://youtu.be/mlpFG8e_fLw) or visit [OpenAI gpt-oss-20b](https://huggingface.co/openai/gpt-oss-120b).