YOYO-AI's picture
Update README.md
a36ea9e verified
metadata
license: apache-2.0
language:
  - en
  - zh
base_model:
  - Qwen/Qwen3-30B-A3B-Thinking-2507
  - Qwen/Qwen3-30B-A3B-Instruct-2507
pipeline_tag: text-generation
tags:
  - merge

This is an auto-thinking-switching model built with model merging and expert substitution techniques: it answers simple questions directly, gives brief thoughts to moderate ones, and delves deeply into difficult ones.

Model Highlights:

  • merge method: arcee_fusion

  • Highest precision: dtype: float32 + out_dtype: bfloat16

  • Context length: 262,144&1010000

Parameter Settings:

Auto-Thinking Mode

Temperature=0.6, TopP=0.95, TopK=20,MinP=0.

Step1: Hybrid Instruct Model and Thinking Model

Conduct initial mixing of the instruction model and reasoning model.

models:
  - model: Qwen/Qwen3-30B-A3B-Thinking-2507
merge_method: arcee_fusion
base_model: Qwen/Qwen3-30B-A3B-Instruct-2507
dtype: float32
out_dtype: bfloat16
tokenizer_source: base
name: Qwen3-30B-A3B-YOYO-AutoThink-preview

Step2: Expert replacement

Inspired by this paper , we use the following regular expression: ^model\.layers\.\d+\.mlp\.experts\.\d+\.(down_proj|gate_proj|up_proj)\.weight$ for expert replacement — all experts in Qwen3-30B-A3B-YOYO-AutoThink-preview that match the regex are replaced with those from Qwen3-30B-A3B-Thinking-2507.