Qwen3-30B-A3B-YOYO-V3-mxfp4-mlx
Where Qwen3-30B-A3B-YOYO-V3-mxfp4 sits in the performance spectrum compared to:
The base Thinking model (Qwen3-30B-A3B-Thinking-2507-bf16)
The base Coder model (unsloth-Qwen3-Coder-30B-A3B-Instruct-qx6)
The best V2 model (Qwen3-30B-A3B-YOYO-V2-qx6-hi)
Key Metrics
Model ARC Challenge ARC Easy BoolQ HellaSwag OpenBookQA PIQA Winogrande
V3-mxfp4 0.464 0.541 0.875 0.692 0.422 0.779 0.639
Base Thinking(bf16) 0.421 0.448 0.682 0.635 0.402 0.771 0.669
Base Coder (qx6) 0.422 0.532 0.881 0.546 0.432 0.724 0.576
Best V2 (qx6-hi) 0.531 0.690 0.885 0.685 0.448 0.785 0.646
V3-mxfp4 compared to the Three Reference Models
We'll calculate average improvement (in percentage points) across all 7 metrics:
A V3-mxfp4 vs. Thinking (bf16)
B V3-mxfp4 vs. Coder (qx6)
C V3-mxfp4 vs. V2 (qx6-hi)
Metric A(Thinking) B(Coder) C(V2)
ARC Challenge +0.043 +0.042 -0.067
ARC Easy +0.093 +0.009 -0.149
BoolQ +0.193 -0.006 -0.010
HellaSwag +0.057 +0.146 +0.007
OpenBookQA +0.020 -0.010 -0.026
PIQA +0.008 +0.055 -0.006
Winogrande -0.030 +0.063 -0.007
Average Performance Position
Comparison Avg. Improvement
V3-mxfp4 vs. Thinking (bf16) +0.057 pp
V3-mxfp4 vs. Coder (qx6) +0.038 pp
V3-mxfp4 vs. V2 (qx6-hi) -0.053 pp
This means:
V3-mxfp4 is ~5.7 pp better than the base Thinking model (on average).
V3-mxfp4 is ~3.8 pp better than the base Coder model (on average).
V3-mxfp4 is ~5.3 pp worse than the V2 model (on average).
Interpretation of Position
Model Type V3-mxfp4 Performance vs. Reference
Base Thinking Model β
Significantly better (avg. +5.7 pp)
Base Coder Model β
Slightly better (avg. +3.8 pp)
V2 Model β Slightly worse (avg. -5.3 pp)
Summary
The V3-mxfp4 model:
Is better than both base models, confirming it is a meaningful upgrade.
Is slightly worse than the V2 model, but this is expected since the V2 was optimized for high performance.
π Average Position as a Hybrid Model:
It is ~5.7 pp better than Thinking
It is ~3.8 pp better than Coder
It is ~5.3 pp worse than V2
Qwen3-30B-A3B-YOYO-V3-mxfp4 compared with Qwen3-30B-A3B-Thinking-2507-bf16
Performance Results
Metric Change Significance
ARC Challenge +0.043 (+10.2%) Significant improvement
ARC Easy +0.093 (+20.8%) Major improvement, especially on reasoning tasks
BoolQ +0.193 (+28.3%) Very significant improvement, likely due to better reasoning
HellaSwag +0.057 (+8.9%) Noticeable improvement, common-sense reasoning
OpenBookQA +0.020 (+4.9%) Improvement in knowledge-based QA
PIQA +0.008 (+1.0%) Slight improvement, no major change
Winogrande -0.030 (-4.5%) Slight decline, but not meaningful
Comparison Summary
Metric V3-mxfp4 Thinking-bf16 Difference
ARC Challenge 46.4% 42.1% +4.3 pp
ARC Easy 54.1% 44.8% +9.3 pp
BoolQ 87.5% 68.2% +19.3 pp
HellaSwag 69.2% 63.5% +5.7 pp
OpenBookQA 42.2% 40.2% +2.0 pp
PIQA 77.9% 77.1% +0.8 pp
Winogrande 63.9% 66.9% -3.0 pp
π Conclusion
The V3-mxfp4 model is significantly better than the base Thinking-2507-bf16 model across all key reasoning tasks:
ARC Challenge is up by 4.3 percentage points.
ARC Easy is up by 9.3 pp β a major improvement.
BoolQ shows the largest gain (+19.3 pp), indicating a major boost in logical reasoning.
The only metric that shows a slight decline is Winogrande (-3 pp), but this is not meaningful.
π‘ Key Takeaway
The V3-mxfp4 model is a clear upgrade over the base Thinking model, confirming that:
- The V3 series (including its mxfp4 variant) is better than the base Thinking model.
- This supports the idea that V3 was designed to improve upon the base Thinking model with better reasoning and performance.
This model Qwen3-30B-A3B-YOYO-V3-mxfp4-mlx was converted to MLX format from YOYO-AI/Qwen3-30B-A3B-YOYO-V3 using mlx-lm version 0.27.1.
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Qwen3-30B-A3B-YOYO-V3-mxfp4-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 82
Model tree for nightmedia/Qwen3-30B-A3B-YOYO-V3-mxfp4-mlx
Base model
YOYO-AI/Qwen3-30B-A3B-YOYO-V3