Text Generation
MLX
Safetensors
qwen3_moe
programming
code generation
code
codeqwen
Mixture of Experts
coding
coder
qwen2
chat
qwen
qwen-coder
Qwen3-Coder-30B-A3B-Instruct
Qwen3-30B-A3B
mixture of experts
128 experts
8 active experts
1 million context
qwen3
finetune
brainstorm 40x
brainstorm
optional thinking
conversational
6-bit
| license: apache-2.0 | |
| library_name: mlx | |
| language: | |
| - en | |
| - fr | |
| - zh | |
| - de | |
| tags: | |
| - programming | |
| - code generation | |
| - code | |
| - codeqwen | |
| - moe | |
| - coding | |
| - coder | |
| - qwen2 | |
| - chat | |
| - qwen | |
| - qwen-coder | |
| - Qwen3-Coder-30B-A3B-Instruct | |
| - Qwen3-30B-A3B | |
| - mixture of experts | |
| - 128 experts | |
| - 8 active experts | |
| - 1 million context | |
| - qwen3 | |
| - finetune | |
| - brainstorm 40x | |
| - brainstorm | |
| - optional thinking | |
| - qwen3_moe | |
| - mlx | |
| base_model: DavidAU/Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL | |
| pipeline_tag: text-generation | |
| # Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64x-hi-mlx | |
| We now have a direct comparison between two variants that differ by only one subtle parameter: | |
| - β Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64-hi | |
| - β Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64x-hi | |
| These variants are part of the same 54B Thinking series, differing only in embedding precision: | |
| - qx64-hi: 4-bit | |
| - qx64x-hi: 6-bit | |
| Both use: | |
| - Weights: 4-bit (qx64) | |
| - Attention paths & Head: 6-bit | |
| - Group Size: 32 (hi suffix) | |
| π Benchmark Comparison | |
| ```bash | |
| Benchmark qx64-hi qx64x-hi Delta | |
| arc_challenge 0.472 0.477 +0.005 | |
| arc_easy 0.559 0.555 -0.004 | |
| boolq 0.872 0.873 +0.001 | |
| hellaswag 0.678 0.681 +0.003 | |
| openbookqa 0.416 0.406 -0.010 | |
| piqa 0.764 0.768 +0.004 | |
| winogrande 0.683 0.685 +0.002 | |
| aggregate avg 0.614 0.618 +0.004 | |
| ``` | |
| π§ Cognitive Impact Analysis | |
| β Winograd Schema (+0.002) | |
| - qx64x-hi leads by 0.2 percentage points β This is a semantic granularity win. | |
| β PIQA (+0.004) | |
| - qx64x-hi slightly better β Indicates that higher precision embeddings improve physical commonsense reasoning. | |
| β HellaSwag (+0.003) | |
| - qx64x-hi edges out β Better commonsense continuation prediction due to semantic clarity. | |
| β ARC Challenge (+0.005) | |
| - qx64x-hi leads β Stronger reasoning foundation. | |
| β OpenBookQA (-0.010) | |
| - qx64-hi slightly better β Possible overfitting in embedding precision for this benchmark. | |
| π Interpretation: | |
| - The qx64x-hi variant sacrifices a small amount of knowledge retrieval accuracy for enhanced semantic inference. | |
| - This aligns with the Deckard philosophy: prioritize semantics over retrieval. | |
| The x refers specifically to: | |
| β 6-bit embeddings (vs. 4-bit in qx64-hi) | |
| This is a critical semantic refinement: | |
| - Embeddings carry meaning | |
| - Higher bit depth β better semantic granularity | |
| - Crucial for nuanced cognitive tasks (Winograd Schema, PIQA) | |
| π Final Verdict | |
| β Choose qx64x-hi for: | |
| - Winograd Schema mastery | |
| - PIQA accuracy | |
| - HellaSwag reasoning fluency | |
| - ARC Challenge robustness | |
| β Avoid qx64-hi unless: | |
| - OpenBookQA is the sole focus | |
| π Summary | |
| ```bash | |
| Variant Semantic Precision Aggregate Avg. | |
| qx64-hi Low (4-bit embeddings) 0.614 | |
| qx64x-hi High (6-bit embeddings) 0.618 β | |
| ``` | |
| β The x suffix is not cosmetic β it significantly improves semantic fidelity, especially in reasoning-intensive benchmarks. | |
| π | |
| > Reviewed with [Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III-qx86x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V3-42B-A3B-Thinking-TOTAL-RECALL-ST-TNG-III-qx86x-hi-mlx) | |
| The original [Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64-hi-mlx) is using 4 bit embeddings | |
| ```bash | |
| Perplexity: 5.286 Β± 0.037 | |
| Peak memory: 39.92 GB | |
| ``` | |
| This model [Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64x-hi-mlx](https://huggingface.co/nightmedia/Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64x-hi-mlx) was | |
| converted to MLX format from [DavidAU/Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL](https://huggingface.co/DavidAU/Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL) | |
| using mlx-lm version **0.28.3**. | |
| ## Use with mlx | |
| ```bash | |
| pip install mlx-lm | |
| ``` | |
| ```python | |
| from mlx_lm import load, generate | |
| model, tokenizer = load("Qwen3-Yoyo-V3-54B-A3B-Thinking-TOTAL-RECALL-qx64x-hi-mlx") | |
| prompt = "hello" | |
| if tokenizer.chat_template is not None: | |
| messages = [{"role": "user", "content": prompt}] | |
| prompt = tokenizer.apply_chat_template( | |
| messages, add_generation_prompt=True | |
| ) | |
| response = generate(model, tokenizer, prompt=prompt, verbose=True) | |
| ``` | |