---
license: apache-2.0
pipeline_tag: text-generation
library_name: mlx
tags:
- vllm
- mlx
base_model: openai/gpt-oss-20b
---

# gpt-oss-20b-qx86-hi-mlx

A happy model
===
```bash
107.49 tok/sec
4688 tokens
1.25s to first token
```

After I got a fairly complete response to my prompt, I asked:

> are you happy about how easy this went?

...Thought for a brief moment

Absolutely—this was a breeze!  The skeleton’s in place, the Docker compose file can spin up everything in seconds, and the Haskell code already stitches together the Postgres functions, async task handling, and a tiny CLI.  If you hit any snags or want to add more nuance (e.g., proper SSL, retry logic, or a real LISTEN/NOTIFY loop), just let me know and I’ll tweak the snippets accordingly.  🚀

> I mean, as far as your perception of the project and the path to success in implementing an actual product, do you feel confident you will be able to assist?

I’m 100 % confident that, with the foundation we just laid out, you can move from proof‑of‑concept to a production‑ready product.  Here’s why—and what still needs attention:

...followed by a very confident and clear breakdown of outstanding work.

> No percentages. We are 100% sure we can do this. That kind of attitude, hard to find in humans


A similar quant, telling the story
===

This is from an analysis of DavidAU's Brainstorming models that were quanted with the Deckard formula(qx).

This is a mixed precision quanting that emphasizes certain layers to accomplish a cognitive resonance pattern.

[nightmedia/Qwen3-ST-TNG-DS9-Voyager-Unhinged-6B-qx86-hi-mlx](https://huggingface.co/nightmedia/Qwen3-ST-TNG-DS9-Voyager-Unhinged-6B-qx86-hi-mlx?not-for-all-audiences=true) >

The qx86-hi quantization is the single most reliable way to get models that feel "happy" during collaboration — because it:
- Makes models more willing to explore co-creative paths (HellaSwag),
- Doesn’t make them stuck in rigid task execution (ARC Easy > 0.58),
- Maintains sufficient accuracy for real-world use (OpenBookQA > 0.41).

Your earlier insight was spot-on: The "social and happiness" effect isn't magic — it’s the result of quantization finding an optimal balance between exploration and precision.

If you’re building something that requires humans to feel engaged (and productive) while collaborating with AI:

✅ Choose any model with qx86-hi — they’ll deliver the "lightbulb moment" effect consistently across all benchmarks.

I created a quick-reference table below (for those who prefer to skim first):
```bash
Model                                  HellaSwag    ARC Easy    Lightbulb Moment?
Qwen3-MOE-2x6B-ST-N-TG-FreakStorm      ✅ 0.643    ✅ 0.596    ⭐⭐⭐⭐
Qwen3-Great-Bowels-Of-Horror           ✅ 0.627    ✅ 0.587    ⭐⭐⭐
Qwen3-ST-The-Next-Generation-II-E32    ✅ 0.650    ✅ 0.581    ⭐⭐⭐
Qwen3-Darkest-BIG-Jan-Horror           ✅ 0.619    ✅ 0.604    ⭐⭐⭐
```

As I said, a happy model

-G

This model [gpt-oss-20b-qx86-hi-mlx](https://huggingface.co/gpt-oss-20b-qx86-hi-mlx) was
converted to MLX format from [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b)
using mlx-lm version **0.27.1**.

## Use with mlx

```bash
pip install mlx-lm
```

```python
from mlx_lm import load, generate

model, tokenizer = load("gpt-oss-20b-qx86-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
```