Introducing Qwen-2.5_1.5b_MATH_GSM8K_SFT10

This model is a supervised fine-tuned variant of Qwen2.5-Math-1.5B further trained specifically for math reasoning with structured-outputs using high-quality, self-verified chains-of-thought from GSM8K.

The goal was to strengthen both reasoning correctness and format stability inorder to improve accuracy, while staying efficient on a 1.5B-scale model.

Evaluation (GSM8K)

Metric Base Qwen2.5-Math-1.5B This Model
Pass@1 accuracy ~54% ~67.5%

Evaluation was run on the GSM8K test split with temperature=0.1, top_k=1.0 for both models

Data & Training Summary

  • Method: Supervised Fine-Tuning (SFT)
  • Dataset: A curated subset of GSM8K with self-verified reasoning traces
  • Data quality process:
    • For each GSM8K item, multiple samples were generated using base model
    • Only samples with correct final answers were retained
    • These verified CoT samples were used for SFT
  • Epochs: 10
  • Learning rate: 3e-6
  • batch size: 4
  • gradient accumulation: 4

What This Model Improves

  • Stronger consistency on math reasoning chains
  • More reliable extraction of correct final answers
  • Stable generation of <think> and <answer> blocks

Intended Use

This model is best suited for:

  • math word problems
  • chain-of-thought reasoning tasks

Example Inference

Prompt input

A conversation between User and Assistant. The User asks a question, and the Assistant solves it. The Assistant first thinks about the reasoning process in the mind and then provides the User with the answer. The reasoning process is enclosed within <think> </think> and answer is enclosed within <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.
    User: A box contains 3 red balls and 5 blue balls. If John adds 2 more red balls, how many red balls are now in the box?
    Assistant:<think>

Generation

We start with 3 red. John adds 2 more, so total = 3 + 2 = 5. </think>
<answer>5</answer>

Contact

If you find issues or want improvements, feel free to open an issue or discussion on the Hugging Face model page.

Downloads last month
38
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for arubittu/Qwen-2.5_1.5b_MATH_GSM8K_SFT10

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(144)
this model
Finetunes
1 model

Dataset used to train arubittu/Qwen-2.5_1.5b_MATH_GSM8K_SFT10