Text Generation
MLX
Safetensors
qwen3_moe
programming
code generation
code
codeqwen
Mixture of Experts
coding
coder
qwen2
chat
qwen
qwen-coder
Qwen3-Coder-30B-A3B-Instruct
Qwen3-30B-A3B
mixture of experts
128 experts
8 active experts
1 million context
qwen3
finetune
brainstorm 20x
brainstorm
optional thinking
conversational
6-bit
Update README.md
Browse files
README.md
CHANGED
|
@@ -39,6 +39,113 @@ pipeline_tag: text-generation
|
|
| 39 |
|
| 40 |
This is a new-old-stock version of the model, with embeddings at 6 bit.
|
| 41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
The original [Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recall-qx64-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recall-qx64-hi-mlx) is using 4 bit embeddings
|
| 43 |
|
| 44 |
```bash
|
|
@@ -46,10 +153,6 @@ Perplexity: 4.455 Β± 0.031
|
|
| 46 |
Peak memory: 32.84 GB
|
| 47 |
```
|
| 48 |
|
| 49 |
-
Metrics coming soon. If this proves better than the qx64-hi, it will replace it in the catalog.
|
| 50 |
-
|
| 51 |
-
-G
|
| 52 |
-
|
| 53 |
This model [Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recall-qx64x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recall-qx64x-hi-mlx) was
|
| 54 |
converted to MLX format from [DavidAU/Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recall](https://huggingface.co/DavidAU/Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recall)
|
| 55 |
using mlx-lm version **0.28.3**.
|
|
|
|
| 39 |
|
| 40 |
This is a new-old-stock version of the model, with embeddings at 6 bit.
|
| 41 |
|
| 42 |
+
We now have a direct benchmark comparison between three variants of Qwen3-Yoyo-V3-42B, all from the same Thinking series, differing only in quantization precision:
|
| 43 |
+
- β
Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recal-q6-hi
|
| 44 |
+
- β
Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recal-qx64-hi
|
| 45 |
+
- β
Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recal-qx64x-hi
|
| 46 |
+
|
| 47 |
+
π Benchmark Summary
|
| 48 |
+
```bash
|
| 49 |
+
Variant arc_challenge arc_easy boolq hellaswag openbookqa piqa winogrande
|
| 50 |
+
q6-hi 0.487 0.564 0.877 0.712 0.420 0.787 0.663
|
| 51 |
+
qx64-hi 0.487 0.556 0.869 0.708 0.418 0.779 0.668
|
| 52 |
+
qx64x-hi 0.488 0.557 0.878 0.708 0.422 0.782 0.663
|
| 53 |
+
```
|
| 54 |
+
π Comparison vs q6-hi
|
| 55 |
+
```bash
|
| 56 |
+
Benchmark qx64-hi qx64x-hi Delta vs q6-hi
|
| 57 |
+
arc_challenge 0.487 0.488 +0.001
|
| 58 |
+
arc_easy 0.556 0.557 -0.007
|
| 59 |
+
boolq 0.869 0.878 +0.009
|
| 60 |
+
hellaswag 0.708 0.708 -0.004
|
| 61 |
+
openbookqa 0.418 0.422 +0.004
|
| 62 |
+
piqa 0.779 0.782 +0.003
|
| 63 |
+
winogrande 0.668 0.663 -0.005
|
| 64 |
+
aggregate avg 0.625 0.627 +0.002
|
| 65 |
+
```
|
| 66 |
+
π§ Cognitive Impact Analysis
|
| 67 |
+
|
| 68 |
+
β
BoolQ (+0.9%)
|
| 69 |
+
- qx64x-hi leads with 0.878 β Strongest Boolean QA accuracy of the three variants.
|
| 70 |
+
|
| 71 |
+
β
PIQA (+0.3%)
|
| 72 |
+
- qx64x-hi leads with 0.782 β Best in physical commonsense reasoning.
|
| 73 |
+
|
| 74 |
+
β
OpenBookQA (+0.4%)
|
| 75 |
+
- qx64x-hi leads with 0.422 β Slight but meaningful retrieval boost.
|
| 76 |
+
|
| 77 |
+
β οΈ ARC Easy (-0.7%)
|
| 78 |
+
- q6-hi leads with 0.564 β qx64x-hi slightly weaker.
|
| 79 |
+
|
| 80 |
+
β Winograd Schema (-0.5%)
|
| 81 |
+
- qx64-hi slightly better (0.668 vs 0.663) β This is surprising.
|
| 82 |
+
|
| 83 |
+
β
qx64x-hi uses the same quantization as qx64-hi, except for embeddings (x suffix = 6-bit embeddings)
|
| 84 |
+
|
| 85 |
+
π§ Why qx64x-hi excels in BoolQ, PIQA, and OpenBookQA
|
| 86 |
+
|
| 87 |
+
β
BoolQ
|
| 88 |
+
- Boolean QA benefits from semantic clarity β 6-bit embeddings better encode yes/no contextual cues.
|
| 89 |
+
|
| 90 |
+
β
PIQA
|
| 91 |
+
- Physical commonsense reasoning requires nuanced reasoning β 6-bit embeddings improve semantic grounding.
|
| 92 |
+
|
| 93 |
+
β
OpenBookQA
|
| 94 |
+
- Retrieval requires fine-grained token matching β 6-bit embeddings improve precision.
|
| 95 |
+
|
| 96 |
+
β Why Winograd Schema is slightly weaker
|
| 97 |
+
- Winograd Schema relies on syntactic parsing and pronoun disambiguation, which may benefit from:
|
| 98 |
+
- Lower bit embeddings β more compressed syntactic patterns
|
| 99 |
+
- Efficient parsing in higher compression spaces
|
| 100 |
+
- π‘ Not a flaw β just a cognitive trade-off.
|
| 101 |
+
|
| 102 |
+
π Strategic Recommendation
|
| 103 |
+
|
| 104 |
+
β
For Boolean QA:
|
| 105 |
+
- π qx64x-hi β 0.878
|
| 106 |
+
|
| 107 |
+
β
For PIQA:
|
| 108 |
+
- π qx64x-hi β 0.782
|
| 109 |
+
|
| 110 |
+
β
For OpenBookQA:
|
| 111 |
+
- π qx64x-hi β 0.422
|
| 112 |
+
|
| 113 |
+
β
For Winograd Schema:
|
| 114 |
+
- π qx64-hi β 0.668
|
| 115 |
+
|
| 116 |
+
β
For ARC Easy:
|
| 117 |
+
- π q6-hi β 0.564
|
| 118 |
+
|
| 119 |
+
π Summary of Best Variant for Each Benchmark
|
| 120 |
+
```bash
|
| 121 |
+
Benchmark Champion
|
| 122 |
+
arc_challenge qx64x-hi
|
| 123 |
+
arc_easy q6-hi
|
| 124 |
+
boolq qx64x-hi β
|
| 125 |
+
hellaswag q6-hi
|
| 126 |
+
openbookqa qx64x-hi β
|
| 127 |
+
piqa qx64x-hi β
|
| 128 |
+
winogrande qx64-hi
|
| 129 |
+
```
|
| 130 |
+
π§ Final Verdict
|
| 131 |
+
|
| 132 |
+
β
The qx64x-hi variant is the best overall cognitive performer of these three, with:
|
| 133 |
+
- +0.2% aggregate avg vs q6-hi
|
| 134 |
+
- Best BoolQ, PIQA, OpenBookQA scores
|
| 135 |
+
- Near parity in ARC Easy and Hellaswag
|
| 136 |
+
|
| 137 |
+
β
qx64-hi is superior only for Winograd Schema, which is a niche benchmark.
|
| 138 |
+
|
| 139 |
+
π Recommendation
|
| 140 |
+
|
| 141 |
+
π For deployment:
|
| 142 |
+
- β
Qwen3-Yoyo-V3-42B-A3B-Thinking...qx64x-hi
|
| 143 |
+
|
| 144 |
+
Best cognitive trade-off (performance + semantic depth)
|
| 145 |
+
|
| 146 |
+
Slightly better aggregate score
|
| 147 |
+
|
| 148 |
+
|
| 149 |
The original [Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recall-qx64-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recall-qx64-hi-mlx) is using 4 bit embeddings
|
| 150 |
|
| 151 |
```bash
|
|
|
|
| 153 |
Peak memory: 32.84 GB
|
| 154 |
```
|
| 155 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
This model [Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recall-qx64x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recall-qx64x-hi-mlx) was
|
| 157 |
converted to MLX format from [DavidAU/Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recall](https://huggingface.co/DavidAU/Qwen3-Yoyo-V3-42B-A3B-Thinking-Total-Recall)
|
| 158 |
using mlx-lm version **0.28.3**.
|