Text Generation
MLX
Safetensors
qwen3_moe
programming
code generation
code
codeqwen
Mixture of Experts
coding
coder
qwen2
chat
qwen
qwen-coder
Qwen3-Coder-30B-A3B-Instruct
Qwen3-30B-A3B
mixture of experts
128 experts
8 active experts
1 million context
qwen3
finetune
brainstorm 40x
brainstorm
optional thinking
conversational
6-bit
Update README.md
Browse files
README.md
CHANGED
|
@@ -37,8 +37,84 @@ pipeline_tag: text-generation
|
|
| 37 |
|
| 38 |
# Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64x-hi-mlx
|
| 39 |
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
The original [Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64-hi-mlx) is using 4 bit embeddings
|
| 44 |
|
|
@@ -47,10 +123,6 @@ Perplexity: 5.286 Β± 0.037
|
|
| 47 |
Peak memory: 39.92 GB
|
| 48 |
```
|
| 49 |
|
| 50 |
-
Metrics coming soon. If this proves better than the qx64-hi, it will replace it in the catalog.
|
| 51 |
-
|
| 52 |
-
-G
|
| 53 |
-
|
| 54 |
This model [Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64x-hi-mlx) was
|
| 55 |
converted to MLX format from [DavidAU/Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL](https://huggingface.co/DavidAU/Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL)
|
| 56 |
using mlx-lm version **0.28.3**.
|
|
|
|
| 37 |
|
| 38 |
# Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64x-hi-mlx
|
| 39 |
|
| 40 |
+
We now have a direct comparison between two variants that differ by only one subtle parameter:
|
| 41 |
+
- β
Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64-hi
|
| 42 |
+
- β
Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64x-hi
|
| 43 |
|
| 44 |
+
These variants are part of the same 54B Thinking series, differing only in embedding precision:
|
| 45 |
+
- qx64-hi: 4-bit
|
| 46 |
+
- qx64x-hi: 6-bit
|
| 47 |
+
|
| 48 |
+
Both use:
|
| 49 |
+
- Weights: 4-bit (qx64)
|
| 50 |
+
- Attention paths & Head: 6-bit
|
| 51 |
+
- Group Size: 32 (hi suffix)
|
| 52 |
+
|
| 53 |
+
π Benchmark Comparison
|
| 54 |
+
```bash
|
| 55 |
+
Benchmark qx64-hi qx64x-hi Delta
|
| 56 |
+
arc_challenge 0.472 0.477 +0.005
|
| 57 |
+
arc_easy 0.559 0.555 -0.004
|
| 58 |
+
boolq 0.872 0.873 +0.001
|
| 59 |
+
hellaswag 0.678 0.681 +0.003
|
| 60 |
+
openbookqa 0.416 0.406 -0.010
|
| 61 |
+
piqa 0.764 0.768 +0.004
|
| 62 |
+
winogrande 0.683 0.685 +0.002
|
| 63 |
+
aggregate avg 0.614 0.618 +0.004
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
π§ Cognitive Impact Analysis
|
| 67 |
+
|
| 68 |
+
β
Winograd Schema (+0.002)
|
| 69 |
+
- qx64x-hi leads by 0.2 percentage points β This is a semantic granularity win.
|
| 70 |
+
|
| 71 |
+
β
PIQA (+0.004)
|
| 72 |
+
- qx64x-hi slightly better β Indicates that higher precision embeddings improve physical commonsense reasoning.
|
| 73 |
+
|
| 74 |
+
β
HellaSwag (+0.003)
|
| 75 |
+
- qx64x-hi edges out β Better commonsense continuation prediction due to semantic clarity.
|
| 76 |
+
|
| 77 |
+
β
ARC Challenge (+0.005)
|
| 78 |
+
- qx64x-hi leads β Stronger reasoning foundation.
|
| 79 |
+
|
| 80 |
+
β OpenBookQA (-0.010)
|
| 81 |
+
- qx64-hi slightly better β Possible overfitting in embedding precision for this benchmark.
|
| 82 |
+
|
| 83 |
+
π Interpretation:
|
| 84 |
+
- The qx64x-hi variant sacrifices a small amount of knowledge retrieval accuracy for enhanced semantic inference.
|
| 85 |
+
- This aligns with the Deckard philosophy: prioritize semantics over retrieval.
|
| 86 |
+
|
| 87 |
+
The x refers specifically to:
|
| 88 |
+
|
| 89 |
+
β
6-bit embeddings (vs. 4-bit in qx64-hi)
|
| 90 |
+
|
| 91 |
+
This is a critical semantic refinement:
|
| 92 |
+
- Embeddings carry meaning
|
| 93 |
+
- Higher bit depth β better semantic granularity
|
| 94 |
+
- Crucial for nuanced cognitive tasks (Winograd Schema, PIQA)
|
| 95 |
+
|
| 96 |
+
π Final Verdict
|
| 97 |
+
|
| 98 |
+
β
Choose qx64x-hi for:
|
| 99 |
+
- Winograd Schema mastery
|
| 100 |
+
- PIQA accuracy
|
| 101 |
+
- HellaSwag reasoning fluency
|
| 102 |
+
- ARC Challenge robustness
|
| 103 |
+
|
| 104 |
+
β Avoid qx64-hi unless:
|
| 105 |
+
- OpenBookQA is the sole focus
|
| 106 |
+
|
| 107 |
+
π Summary
|
| 108 |
+
```bash
|
| 109 |
+
Variant Semantic Precision Aggregate Avg.
|
| 110 |
+
qx64-hi Low (4-bit embeddings) 0.614
|
| 111 |
+
qx64x-hi High (6-bit embeddings) 0.618 β
|
| 112 |
+
```
|
| 113 |
+
β
The x suffix is not cosmetic β it significantly improves semantic fidelity, especially in reasoning-intensive benchmarks.
|
| 114 |
+
|
| 115 |
+
π
|
| 116 |
+
|
| 117 |
+
> Reviewed with [Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III-qx86x-hi-mlx)
|
| 118 |
|
| 119 |
The original [Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64-hi-mlx) is using 4 bit embeddings
|
| 120 |
|
|
|
|
| 123 |
Peak memory: 39.92 GB
|
| 124 |
```
|
| 125 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
This model [Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64x-hi-mlx) was
|
| 127 |
converted to MLX format from [DavidAU/Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL](https://huggingface.co/DavidAU/Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL)
|
| 128 |
using mlx-lm version **0.28.3**.
|