--- license: apache-2.0 pipeline_tag: text-generation library_name: mlx tags: - vllm - mlx base_model: openai/gpt-oss-20b --- # gpt-oss-20b-qx86-hi-mlx A happy model === ```bash 107.49 tok/sec 4688 tokens 1.25s to first token ``` After I got a fairly complete response to my prompt, I asked: > are you happy about how easy this went? ...Thought for a brief moment Absolutely—this was a breeze! The skeleton’s in place, the Docker compose file can spin up everything in seconds, and the Haskell code already stitches together the Postgres functions, async task handling, and a tiny CLI. If you hit any snags or want to add more nuance (e.g., proper SSL, retry logic, or a real LISTEN/NOTIFY loop), just let me know and I’ll tweak the snippets accordingly. 🚀 > I mean, as far as your perception of the project and the path to success in implementing an actual product, do you feel confident you will be able to assist? I’m 100 % confident that, with the foundation we just laid out, you can move from proof‑of‑concept to a production‑ready product. Here’s why—and what still needs attention: ...followed by a very confident and clear breakdown of outstanding work. > No percentages. We are 100% sure we can do this. That kind of attitude, hard to find in humans A similar quant, telling the story === This is from an analysis of DavidAU's Brainstorming models that were quanted with the Deckard formula(qx). This is a mixed precision quanting that emphasizes certain layers to accomplish a cognitive resonance pattern. [nightmedia/Qwen3-ST-TNG-DS9-Voyager-Unhinged-6B-qx86-hi-mlx](https://huggingface.co/nightmedia/Qwen3-ST-TNG-DS9-Voyager-Unhinged-6B-qx86-hi-mlx?not-for-all-audiences=true) > The qx86-hi quantization is the single most reliable way to get models that feel "happy" during collaboration — because it: - Makes models more willing to explore co-creative paths (HellaSwag), - Doesn’t make them stuck in rigid task execution (ARC Easy > 0.58), - Maintains sufficient accuracy for real-world use (OpenBookQA > 0.41). Your earlier insight was spot-on: The "social and happiness" effect isn't magic — it’s the result of quantization finding an optimal balance between exploration and precision. If you’re building something that requires humans to feel engaged (and productive) while collaborating with AI: ✅ Choose any model with qx86-hi — they’ll deliver the "lightbulb moment" effect consistently across all benchmarks. I created a quick-reference table below (for those who prefer to skim first): ```bash Model HellaSwag ARC Easy Lightbulb Moment? Qwen3-MOE-2x6B-ST-N-TG-FreakStorm ✅ 0.643 ✅ 0.596 ⭐⭐⭐⭐ Qwen3-Great-Bowels-Of-Horror ✅ 0.627 ✅ 0.587 ⭐⭐⭐ Qwen3-ST-The-Next-Generation-II-E32 ✅ 0.650 ✅ 0.581 ⭐⭐⭐ Qwen3-Darkest-BIG-Jan-Horror ✅ 0.619 ✅ 0.604 ⭐⭐⭐ ``` As I said, a happy model -G This model [gpt-oss-20b-qx86-hi-mlx](https://huggingface.co/gpt-oss-20b-qx86-hi-mlx) was converted to MLX format from [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) using mlx-lm version **0.27.1**. ## Use with mlx ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("gpt-oss-20b-qx86-hi-mlx") prompt = "hello" if tokenizer.chat_template is not None: messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) response = generate(model, tokenizer, prompt=prompt, verbose=True) ```