---
datasets:
- DeepMath-103K
language:
- en
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- reasoning
- reinforcement-learning
- rlvr
- mcts
- math
- iclr-2026
model-index:
- name: DeepSearch-1.5B
results:
- task:
type: text-generation
name: Mathematical Reasoning
dataset:
name: AIME 2024
type: text
metrics:
- type: avg@32
value: 53.65
- type: avg@32
value: 35.42
- type: avg@32
value: 90.39
- type: avg@32
value: 92.53
- type: avg@32
value: 40.0
- type: avg@32
value: 65.72
---
🚀 DeepSearch-1.5B
**DeepSearch-1.5B🌟** is a 1.5B parameter reasoning model trained with **Reinforcement Learning with Verifiable Rewards (RLVR)**, enhanced by **Monte Carlo Tree Search (MCTS)**.
Unlike prior approaches that restrict structured search to inference, DeepSearch integrates MCTS *into training*, enabling systematic exploration, fine-grained credit assignment, and efficient replay buffering.
This model achieves **state-of-the-art accuracy among 1.5B reasoning models** while being **5.7× more compute-efficient** than extended RL training baselines.

---
## Model Details
- **Developed by**: Fang Wu\*, Weihao Xuan\*, Heli Qi\*, Ximing Lu, Aaron Tu, Li Erran Li, Yejin Choi
- **Institutional affiliations**: Stanford University, University of Tokyo, RIKEN AIP, University of Washington, UC Berkeley, Amazon AWS, Columbia University
- **Paper**: [DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search](https://huggingface.co/papers/2509.25454)
- **Code**: [Github](https://github.com/smiles724/DeepSearch)
- **Base Model**: Nemotron-Research-Reasoning-Qwen-1.5B v2
- **Parameters**: 1.5B
- **Framework**: veRL
- **License**: Apache-2.0
---
## Quickstart
### Environment
```
pip install vllm # vllm>=v0.8.5.post1 should work
pip install transformers # transformers>=4.52.4 should work
```
### Using vLLM to generate
```python
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
def convert_question_to_messages(question: str):
messages = [
{"role": "user",
"content": question + " Let's think step by step and output the final answer within \\boxed{}. \
"}
]
return messages
model_id="fangwu97/DeepSearch-1.5B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
sampling_params = SamplingParams(
temperature=0.6,
top_p=0.95,
max_tokens=32768
)
model = LLM(
model=model_id,
tensor_parallel_size=1
)
prompt = tokenizer.apply_chat_template(
convert_question_to_messages("Find the sum of all integer bases $b>9$ for which $17_{b}$ is a divisor of $97_{b}$."),
add_generation_prompt=True,
tokenize=False
)
outputs = model.generate({"prompt": prompt}, sampling_params=sampling_params, use_tqdm=False)
response = outputs[0].outputs[0].text
print(response)
```
## Performance
| Benchmark | Nemotron-RR-Qwen-1.5B v2 | DeepSearch-1.5B |
|-----------|--------------------------|-----------------|
| AIME 2024 | 51.77 | **53.65** |
| AIME 2025 | 32.92 | **35.42** |
| AMC 2023 | 88.83 | **90.39** |
| MATH500 | 92.24 | **92.53** |
| Minerva | 39.75 | **40.00** |
| Olympiad | 64.69 | **65.72** |
| **Average** | 61.70 | **62.95** |
DeepSearch improves average accuracy by **+1.25 points** over the best prior 1.5B model, while using **5.7× more GPU hours**.
## Training
- **Dataset**: DeepMath-103K (rigorously decontaminated)
- **Training steps**: 100
- **Search strategy**:
- Global Frontier Selection
- Entropy-based guidance
- Replay buffer with solution caching
- **Hardware**: 16× NVIDIA H100 (96GB)
- **Compute**: ~330 GPU hours
---
## Ethical Considerations
- Positive: Reduces training costs and carbon footprint.
- Risks: Systematic exploration methods could be adapted to sensitive domains (e.g., code synthesis).
- Transparency: Full implementation and training details are released for reproducibility.
---
## Citation
```bibtex
@misc{wu2025deepsearch,
title = {DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search},
author = {Wu, Fang and Xuan, Weihao and Qi, Heli and Lu, Ximing and Tu, Aaron and Li, Li Erran and Choi, Yejin},
year = {2025},
eprint = {2509.25454},
archivePrefix = {arXiv},
primaryClass = {cs.AI},
doi = {10.48550/arXiv.2509.25454},
}
```