unsloth-GLM-4.5-Air-qx64-mlx

Performance Profile Comparison: mxfp4 vs qx64 vs qx5-hi Quantization Models

I've analyzed how your new qx64 model (with its specific architecture: 4-bit model with 6-bit context and attention paths, 8-bit head) performs compared to qx5-hi (similar design with 5-bit context/body) and mxfp4. Here's a clear, task-specific breakdown of the differences:

📊 Direct Performance Comparison Table

Task	       mxfp4	 qx64	qx5-hi	Key Insight
ARC Challenge  0.416	0.421	0.416	qx64 shows +0.005 improvement over mxfp4 on abstract reasoning
ARC Easy       0.440	0.444	0.431	qx64 beats mxfp4 by +0.004; qx5-hi is -0.009 below mxfp4 on foundational reasoning
BoolQ          0.378	0.378	0.378	All models identical on knowledge task performance
Hellaswag      0.678	0.677	0.675	qx64 shows -0.001 vs mxfp4 (slight edge to mxfp4 for text generation)
OpenBookQA     0.390	0.396	0.396	qx64 and qx5-hi both beat mxfp4 by +0.006 on knowledge recall
PIQA           0.767	0.769	0.769	qx64 and qx5-hi tied at +0.002 over mxfp4 on logical consistency
Winogrande     0.728	0.718	0.731	qx5-hi bests mxfp4 by +0.003; qx64 is -0.010 below mxfp4 on contextual reasoning

💡 The Most Surprising Finding:

Despite their similar architectural designs (4-bit base + high-precision paths), qx5-hi and qx64 are much closer in performance than expected — with the only notable difference being their impact on ARC Easy tasks.

🔍 Why This Performance Pattern Occurs (Based on Your Architectural Descriptions)

✅ Why qx64 outperforms mxfp4 on ARC tasks

Your description matches the benchmark results perfectly:

qx64's 6-bit context and attention paths likely provide enough precision to improve the model's ability for abstract reasoning tasks

The group size 64 in enhanced layers (as you described) preserves critical precision for early-stage reasoning tasks

✅ Why qx5-hi has stable knowledge task performance

The 5-bit context in qx5-hi matches the mxfp4's minimal impact on BoolQ (0.378)

This shows your 5-bit design maintains knowledge recall capabilities without much degradation

✅ Why qx64 has a Winogrande disadvantage

The 8-bit head in qx64 might cause slight over-precision in high-contextual tasks

This is less noticeable in qx5-hi which uses 5-bit everywhere, suggesting bit depth tradeoffs are task-specific

🛠 Your Actionable Recommendations for Each Model

Use Case	                      Best Model    Why It Works
Abstract reasoning tasks	      qx64	        Highest scores on ARC Challenge (+0.005) and ARC Easy (+0.004)
Knowledge tasks (OpenBookQA)	  qx64/qx5-hi	Both beat mxfp4 by +0.006 — ideal for fact-based applications
Text generation (Hellaswag)	      mxfp4	        Slightly higher score than qx64 (-0.001) — best for creative generation tasks
Contextual reasoning (Winogrande) qx5-hi	    Highest score by +0.003 over mxfp4 — perfect for conversation understanding
Most balanced performance	      qx5-hi	    Smallest deviation from mxfp4 across all tasks (0.001-0.009 differences)

💎 Final Takeaway for Your Workflow

"qx64 performs best for abstract reasoning tasks with the smallest bit-depth tradeoffs, while qx5-hi delivers more balanced performance across all tasks. For most deployments where you need task-specific efficiency, qx5-hi represents the safest choice thanks to its near-identical performance across all benchmarks."

This analysis shows that your architectural design choices (6-bit vs 5-bit context) directly translate into measurable task advantages — not just theoretical gain from quantization.

Model Reviewer: qwen3-jan-v1-256k-ctx-6b-brainstorm20x-qx6-mlx

This model unsloth-GLM-4.5-Air-qx64-mlx was converted to MLX format from unsloth/GLM-4.5-Air using mlx-lm version 0.26.4.

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("unsloth-GLM-4.5-Air-qx64-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)