Junteng commited on
Commit
d9890fe
·
verified ·
1 Parent(s): 8ed4746

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +73 -3
README.md CHANGED
@@ -1,3 +1,73 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - LLM
7
+ library_name: transformers
8
+ base_model:
9
+ - Qwen/Qwen2.5-32B
10
+ datasets:
11
+ - MiniMaxAI/SynLogic
12
+ ---
13
+ # SynLogic Zero-Mix-3: Large-Scale Multi-Domain Reasoning Model
14
+
15
+ * 🐙 **GitHub Repo:** [https://github.com/MiniMax-AI/SynLogic](https://github.com/MiniMax-AI/SynLogic)
16
+ * 📜 **Paper (arXiv):** [https://arxiv.org/abs/2505.19641](https://arxiv.org/abs/2505.19641)
17
+ * 🤗 **Dataset:** [SynLogic on Hugging Face](https://huggingface.co/datasets/MiniMaxAI/SynLogic)
18
+
19
+ ## Model Overview
20
+
21
+ **Zero-Mix-3** is an advanced multi-domain reasoning model trained using Zero-RL (reinforcement learning from scratch) on a diverse mixture of logical reasoning, mathematical, and coding data. Built on Qwen2.5-32B-Base, this model demonstrates the power of combining diverse verifiable reasoning tasks in a unified training framework.
22
+
23
+ ## Key Features
24
+
25
+ * **Multi-Domain Training:** Jointly trained on logical reasoning (SynLogic), mathematics, and coding tasks
26
+ * **Zero-RL Training:** Trained from base model without instruction tuning, using pure reinforcement learning
27
+ * **Diverse Data Mixture:** 35k mathematical samples + 9k coding samples + 17k SynLogic samples
28
+ * **Enhanced Generalization:** Superior cross-domain transfer compared to single-domain training
29
+
30
+ ## Performance Highlights
31
+
32
+ | Model | BBEH | KOR-Bench | LiveCodeBench | AIME 2024 | GPQA Diamond |
33
+ |-------|------|-----------|---------------|-----------|--------------|
34
+ | DeepSeek-R1-Distill-Qwen-32B | 19.2 | 66.6 | 57.2 | 72.6 | 63.1 |
35
+ | DeepSeek-R1-Zero-Qwen-32B | - | - | 40.2 | **47.0** | 55.0 |
36
+ | Zero-Mix-2 (Math+Coding) | 18.5 | 58.6 | 39.5 | 34.5 | 55.2 |
37
+ | **Zero-Mix-3 (SynLogic+Math+Coding)** | **28.6** | **65.0** | **40.7** | 35.8 | **57.5** |
38
+
39
+ **Key Achievements:**
40
+ - **Matches or Surpasses** DeepSeek-R1-Distill-Qwen-32B on KOR-Bench and BBEH (+9.4 points)
41
+ - **Outperforms** DeepSeek-R1-Zero-Qwen-32B on LiveCodeBench and GPQA-Diamond (+2.5 points)
42
+
43
+
44
+ ## Training Details
45
+
46
+ * **Base Model:** Qwen2.5-32B-Base
47
+ * **Training Algorithm:** GRPO (Group Relative Policy Optimization)
48
+ * **Training Data:**
49
+ - 35k mathematical reasoning samples
50
+ - 9k coding problem samples
51
+ - 17k SynLogic logical reasoning samples
52
+
53
+
54
+ ## Ablation Insights
55
+
56
+ Comparison with Zero-Mix-2 (Math+Coding only) demonstrates that adding SynLogic logical reasoning data:
57
+ - **+10.1 points** on logical reasoning (BBEH)
58
+ - **+6.4 points** on logical reasoning (KOR-Bench)
59
+ - **+2.3 points** on out-of-domain reasoning (GPQA-Diamond)
60
+ - **+1.2 points** on coding (LiveCodeBench)
61
+
62
+
63
+
64
+ ## Citation
65
+
66
+ ```bibtex
67
+ @article{liu2025synlogic,
68
+ title={SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond},
69
+ author={Junteng Liu and Yuanxiang Fan and Zhuo Jiang and Han Ding and Yongyi Hu and Chi Zhang and Yiqi Shi and Shitong Weng and Aili Chen and Shiqi Chen and Yunan Huang and Mozhi Zhang and Pengyu Zhao and Junjie Yan and Junxian He},
70
+ journal={arXiv preprint arXiv:2505.19641},
71
+ year={2025}
72
+ }
73
+ ```