Introducing Qwen-2.5_1.5b_MATH_GSM8K_SFT10

This model is a supervised fine-tuned variant of Qwen2.5-Math-1.5B further trained specifically for math reasoning with structured-outputs using high-quality, self-verified chains-of-thought from GSM8K.

The goal was to strengthen both reasoning correctness and format stability inorder to improve accuracy, while staying efficient on a 1.5B-scale model.

Evaluation (GSM8K)

Metric	Base Qwen2.5-Math-1.5B	This Model
Pass@1 accuracy	~54%	~67.5%

Evaluation was run on the GSM8K test split with temperature=0.1, top_k=1.0 for both models

Data & Training Summary

Method: Supervised Fine-Tuning (SFT)
Dataset: A curated subset of GSM8K with self-verified reasoning traces
Data quality process:
- For each GSM8K item, multiple samples were generated using base model
- Only samples with correct final answers were retained
- These verified CoT samples were used for SFT
Epochs: 10
Learning rate: 3e-6
batch size: 4
gradient accumulation: 4

What This Model Improves

Stronger consistency on math reasoning chains
More reliable extraction of correct final answers
Stable generation of <think> and <answer> blocks

Intended Use

This model is best suited for:

math word problems
chain-of-thought reasoning tasks

Example Inference

Prompt input

A conversation between User and Assistant. The User asks a question, and the Assistant solves it. The Assistant first thinks about the reasoning process in the mind and then provides the User with the answer. The reasoning process is enclosed within <think> </think> and answer is enclosed within <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>.
    User: A box contains 3 red balls and 5 blue balls. If John adds 2 more red balls, how many red balls are now in the box?
    Assistant:<think>

Generation

We start with 3 red. John adds 2 more, so total = 3 + 2 = 5. </think>
<answer>5</answer>

Contact

If you find issues or want improvements, feel free to open an issue or discussion on the Hugging Face model page.

Downloads last month: 38

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for arubittu/Qwen-2.5_1.5b_MATH_GSM8K_SFT10

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-Math-1.5B

Finetuned

(144)

this model

Finetunes

1 model

arubittu
/

Qwen-2.5_1.5b_MATH_GSM8K_SFT10