metadata
license: apache-2.0
language:
- en
- zh
base_model:
- Qwen/Qwen3-30B-A3B-Thinking-2507
- Qwen/Qwen3-30B-A3B-Instruct-2507
pipeline_tag: text-generation
tags:
- merge
This is an auto-thinking-switching model built with model merging and expert substitution techniques: it answers simple questions directly, gives brief thoughts to moderate ones, and delves deeply into difficult ones.
Model Highlights:
merge method:
arcee_fusionHighest precision:
dtype: float32+out_dtype: bfloat16Context length:
262,144&1010000
Parameter Settings:
Auto-Thinking Mode
Temperature=0.6,TopP=0.95,TopK=20,MinP=0.
Step1: Hybrid Instruct Model and Thinking Model
Conduct initial mixing of the instruction model and reasoning model.
models:
- model: Qwen/Qwen3-30B-A3B-Thinking-2507
merge_method: arcee_fusion
base_model: Qwen/Qwen3-30B-A3B-Instruct-2507
dtype: float32
out_dtype: bfloat16
tokenizer_source: base
name: Qwen3-30B-A3B-YOYO-AutoThink-preview
Step2: Expert replacement
Inspired by this paper , we use the following regular expression: ^model\.layers\.\d+\.mlp\.experts\.\d+\.(down_proj|gate_proj|up_proj)\.weight$ for expert replacement — all experts in Qwen3-30B-A3B-YOYO-AutoThink-preview that match the regex are replaced with those from Qwen3-30B-A3B-Thinking-2507.