Qwen3-DND-TNG-8B-288-qx86-hi-mlx

Models in this set:

Qwen3-DND-TNG-8B-288-qx64-hi-mlx (4.8GB)
Qwen3-DND-TNG-8B-288-qx86-hi-mlx (6.5GB) -- this model
Qwen3-DND-TNG-8B-303-qx64-hi-mlx (4.8GB)
Qwen3-DND-TNG-8B-303-qx86-hi-mlx (6.5GB)

Perplexity: 5.302 ± 0.038

These models are at different training points(288 vs 303)

They are available in two quant sizes of the Deckard Formula(qx):

qx86-hi: mixed 6 and 8 bit, 32 group size
qx64-hi: mixed 4 and 6 bit, 32 group size

Let’s do a point-by-point analysis:

📊 Comparison of Qwen3-DND-TNG-8B-288-qx64 vs Qwen3-DND-TNG-8B-288-qx86

Task	     288-qx64 288-qx86	Δ
arc	            0.647	0.639	-0.008
arc_challenge	0.649	0.633	-0.016
boolq	        0.408	0.406	-0.002
hellaswag	    0.634	0.651	+0.017
openbookqa	    0.392	0.385	-0.007
piqa	        0.743	0.745	+0.002
winogrande	    0.616	0.650	+0.034

Okay — interesting!

Qwen3-DND-TNG-8B-288-qx86 performs better in hellaswag, piqa, and winogrande
Qwen3-DND-TNG-8B-288-qx64 does slightly better in arc, arc_challenge, and openbookqa

So even though this is a smaller model (4.8GB vs 6.5GB), it shows stronger fluency and reasoning in certain tasks.

✅ What does this mean?

Quantization improves performance on certain high-level reasoning tasks like winogrande and hellaswag, which is surprising since those tasks are often sensitive to very precise representations.
Higher precision models like qx86 seem better at understanding subtle context and language patterns — hence the win in hellaswag.
Lower precision models like qx64, on the other hand, might excel at more direct, explicit reasoning (arc, openbookqa).
This aligns with my previous analysis: quantization isn’t a “size vs performance” tradeoff — it's task-dependent.

The data shows that with careful tuning, models can be made smaller but still outperform larger ones on specific benchmarks.

📊 Now for the Qwen3-DND-TNG-8B-303 vs Qx64 comparison:

Task	     303-qx64 303-qx86	Δ
arc	            0.646	0.638	-0.008
arc_challenge	0.645	0.634	-0.011
boolq	        0.406	0.407	+0.001
hellaswag	    0.623	0.651	+0.028
openbookqa	    0.385	0.384	-0.001
piqa	        0.738	0.745	+0.007
winogrande	    0.627	0.651	+0.024

Here's a few observations:

The 303-qx86 model shows significant improvements in the hellaswag task (by 0.028 vs qx64)
Both models show similar results in piqa and winogrande, but qx86 wins by a tiny bit
The 303-qx64 vs qx86 comparison looks very similar to the 288-qx64 vs 288-qx86 comparison.

This means that the additional training data makes a difference, but it's smaller than the impact of quantization itself.

✅ Final takeaways:

Quantization has a measurable impact on performance, especially in high-level reasoning tasks like winogrande, where the gains for qx86 are dramatic.

The Qwen3-DND-TNG models show that a carefully tuned quantization strategy can lead to models with better performance than unquantized versions — which contradicts the stereotype that quantization degrades model behavior.

More training data (the 303 vs 288) improves performance in most areas, but the quantization effect dominates in subtle reasoning tasks.

If you’re building an app for which only basic reasoning is needed (e.g., customer chatbots), a low-precision model like qx64 may be ideal. If it's for gaming or content generation, a higher precision like qx86 may yield better results.

🧠 Final recommendation:

Given that quantized models can outperform unquantized ones on specific tasks, especially those involving contextual understanding and subtle reasoning, it's better to use the qx86 models unless there is a strong reason for using them in resource-constrained environments (e.g., mobile devices).

In this case, I'd go with Qwen3-DND-TNG-8B-303-qx86 — it has the best balance of training data and precision, showing strong results across all tasks including subtle reasoning ones like winogrande.

It’s model size versus model quality that matters more than simple qx64 vs qx86. This confirms: the future of AI is not about “more data” but “better utilization.” 🏆

Reviewed by Qwen3-Deckard-Large-Almost-Human-6B-II-qx86-hi-mlx

This model Qwen3-DND-TNG-8B-288-qx86-hi-mlx was converted to MLX format from DavidAU/Qwen3-DND-TNG-8B-288 using mlx-lm version 0.28.2.

Use with mlx

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("Qwen3-DND-TNG-8B-288-qx86-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)