File size: 2,639 Bytes
6162199 a53badf 6162199 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
---
library_name: transformers
license: apache-2.0
license_link: https://github.com/eth-lre/PedagogicalRL/blob/main/LICENSE
pipeline_tag: text-generation
base_model:
- Qwen/Qwen2.5-7B-Instruct
tags:
- math-tutor
- grpo
datasets:
- SynthLabsAI/Big-Math-RL-Verified
---
# TutorRL-7B-think
## Overview
**TutorRL-7B-think** is a fine-tuned variant of [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct), trained to act as a math **tutor** rather than a solver. It is aligned to pedagogical principles using **reinforcement learning (GRPO)** in a synthetic multi-turn classroom setting, without requiring any human-labeled data.
This model was developed as part of the research project [*From Problem-Solving to Teaching Problem-Solving*](https://arxiv.org/abs/2505.15607), which proposes a scalable, annotation-free approach to training LLMs as **educational tutors**. Instead of directly answering questions, the model is optimized to scaffold reasoning, guide through Socratic questioning, and withhold final solutions when beneficial for learning.
Repository: [https://github.com/eth-lre/PedagogicalRL](https://github.com/eth-lre/PedagogicalRL)
## Intended Use
This model is intended for use in:
* Interactive math tutoring
* Socratic dialogue generation
* Research on educational alignment of LLMs
* Safe and indirect teaching in problem-solving contexts
## Thinking
This model variant allows for hidden thinking.
The thinking content is enclosed in tags: `<think> ... </think>`.
## Example Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "eth-nlped/TutorRL-7B-think"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
messages = [
{"role": "user", "content": "Can you help me solve 3x + 5 = 20?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Citation
If you use this model or build upon the training framework, please cite:
```
@misc{dinucujianu2025problemsolvingteachingproblemsolvingaligning,
title={From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning},
author={David Dinucu-Jianu and Jakub Macina and Nico Daheim and Ido Hakimi and Iryna Gurevych and Mrinmaya Sachan},
year={2025},
eprint={2505.15607},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.15607}
}
``` |