Model Card for SmolLM2Prover

SmolLM2Prover is a specialized, fine-tuned version of prithivMLmods/SmolLM2-CoT-360M. While retaining the strong conversational abilities of its base model, this version has been specifically enhanced to excel at deep thinking, logical reasoning, and higher-level mathematics, with a focus on generating step-by-step proofs and explanations (Chain-of-Thought).

The model was fine-tuned using multiple rounds of Supervised Fine-Tuning (SFT) with the TRL library on a curated dataset, enhancing its ability to follow complex instructions and reason through problems.

Model Details

Base Model: prithivMLmods/SmolLM2-CoT-360M
Fine-tuning Library: TRL (Transformer Reinforcement Learning)
Specialization: Mathematical reasoning, proof generation, Chain-of-Thought (CoT)
Training Data: Fine-tuned on AI-MO/NuminaMath-1.5 and an additional ~1 million tokens of custom-formatted reasoning data.

How to Use

This model is intended to be used for text generation tasks that require logical reasoning or advanced conversation.

Using the Pipeline

The easiest way to use the model is with the transformers pipeline.

from transformers import pipeline
import torch


model_id = "reaperdoesntknow/SMOLM2Prover"
prompt = "Prove that the derivative of f(x) = x^2 is f'(x) = 2x using the limit definition of a derivative."

generator = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16, # Or torch.float16 if bfloat16 is not available
    device_map="auto"
)

# Using a chat format for better instruction following
messages = [
    {"role": "user", "content": f"You are a helpful math assistant. Please solve the following problem step-by-step.\n\n{prompt}"}
]

output = generator(messages, max_new_tokens=512, return_full_text=False)
print(output[0]["generated_text"])

Manual Usage
For more control, you can use AutoModelForCausalLM and AutoTokenizer directly.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch


model_id = "reaperdoesntknow/SMOLM2Prover"
prompt = "Prove that the derivative of f(x) = x^2 is f'(x) = 2x using the limit definition of a derivative."

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16, # Or torch.float16
    device_map="auto"
)

# Apply the chat template for proper formatting
messages = [
    {"role": "user", "content": f"You are a helpful math assistant. Please solve the following problem step-by-step.\n\n{prompt}"}
]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)

outputs = model.generate(tokenized_chat, max_new_tokens=512)
decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Print only the generated part
print(decoded_output.split("assistant\n")[-1])

Training

The model underwent several rounds of Supervised Fine-Tuning (SFT) using TRL's SFTTrainer.

Training Data: The primary dataset used was AI-MO/NuminaMath-1.5, augmented with approximately 1 million additional tokens. This data was formatted with a specific prompt structure designed to elicit step-by-step, chain-of-thought reasoning from the model.
Process: The iterative SFT approach allowed for progressive refinement of the model's reasoning capabilities.

Framework Versions

Transformers: 4.56.0
Pytorch: 2.8.0+cu126
TRL: 0.22.2
Datasets: 4.0.0
Tokenizers: 0.22.0

Intended Use

This model is a versatile tool suitable for a range of applications, from everyday conversation to complex problem-solving.

Primary Use Cases (Specialized Skills):
- Educational tools for higher-level mathematics and logic.
- Automated proof generation and verification.
- Step-by-step problem-solving assistants for complex topics.
- Serving as a "thinking" component for applications requiring deep reasoning.
General Use Cases:
- General-purpose conversation and advanced chatbot applications.
- Complex instruction-following tasks.
- Content generation that requires logical consistency. Limitations and Bias
Mathematical Accuracy: While highly capable, the model can still make errors or "hallucinate" incorrect steps or solutions in complex mathematical proofs. All outputs, especially for critical applications, should be verified by a human expert.
Domain Performance: The model's performance is most reliable on problems similar to its training data. While it is designed to handle higher levels of math and deep thinking, its accuracy in novel or esoteric domains should be carefully evaluated.
Inherited Bias: This model inherits any biases present in the base model (SmolLM2-CoT-360M) and the training datasets.

Acknowledgements

You're doing great!

Citations

If you use TRL in your work, please cite the library: @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{'e}dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} }

Downloads last month: 2

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for reaperdoesntknow/SMOLM2Prover

Base model

HuggingFaceTB/SmolLM2-360M

Quantized

HuggingFaceTB/SmolLM2-360M-Instruct

Finetuned

prithivMLmods/SmolLM2-CoT-360M