Introducing Qwen-2.5_1.5b_MATH_GSM8K_SFT10
This model is a supervised fine-tuned variant of Qwen2.5-Math-1.5B further trained specifically for math reasoning with structured-outputs using high-quality, self-verified chains-of-thought from GSM8K.
The goal was to strengthen both reasoning correctness and format stability inorder to improve accuracy, while staying efficient on a 1.5B-scale model.
Evaluation (GSM8K)
| Metric | Base Qwen2.5-Math-1.5B | This Model |
|---|---|---|
| Pass@1 accuracy | ~54% | ~67.5% |
Evaluation was run on the GSM8K test split with temperature=0.1, top_k=1.0 for both models
Data & Training Summary
- Method: Supervised Fine-Tuning (SFT)
- Dataset: A curated subset of GSM8K with self-verified reasoning traces
- Data quality process:
- For each GSM8K item, multiple samples were generated using base model
- Only samples with correct final answers were retained
- These verified CoT samples were used for SFT
- Epochs: 10
- Learning rate: 3e-6
- batch size: 4
- gradient accumulation: 4
What This Model Improves
- Stronger consistency on math reasoning chains
- More reliable extraction of correct final answers
- Stable generation of
<think>and<answer>blocks
Intended Use
This model is best suited for:
- math word problems
- chain-of-thought reasoning tasks
Example Inference
Prompt input
A conversation between User and Assistant. The User asks a question, and the Assistant solves it. The Assistant first thinks about the reasoning process in the mind and then provides the User with the answer. The reasoning process is enclosed within <think> </think> and answer is enclosed within <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.
User: A box contains 3 red balls and 5 blue balls. If John adds 2 more red balls, how many red balls are now in the box?
Assistant:<think>
Generation
We start with 3 red. John adds 2 more, so total = 3 + 2 = 5. </think>
<answer>5</answer>
Contact
If you find issues or want improvements, feel free to open an issue or discussion on the Hugging Face model page.
- Downloads last month
- 38