Model Card for ParaThinker-1.5B
ParaThinker-1.5B is a 1.5 billion parameter language model designed for efficient mathematical reasoning through native parallel thinking. Built upon the DeepSeek-R1-Distill-Qwen-1.5B base model, it introduces specialized training to support up to 8 parallel reasoning paths thinking, leveraging KV-cache reuse via PagedAttention in vLLM. This model is detailed in our paper: ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute
This modelcard aims to be a base template for new models. It has been generated using this raw template.
Model Details
Model Description
ParaThinker-1.5B enhances small-scale LLMs by enabling parallel reasoning paths with minimal latency overhead. It uses special tokens (<think1>
~ <think8>
) to boost thought diversity and a summarization template for coherent final answers. The model excels in math reasoning tasks, achieving much higher accuracy than sequential baselines on benchmarks like AIME.
- Developed by: Hao Wen, Yifan Su et al.
- Model type: Causal Language Model
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Model Sources
- Github Repository: https://github.com/MobileLLM/ParaThinker
- Paper: https://arxiv.org/pdf/2509.04475
Uses
Direct Use
ParaThinker-1.5B is intended for mathematical reasoning tasks, such as solving problems from AIME, AMC, or MATH-500 datasets. It can be used directly with the vLLM-based ParaThinker inference engine to generate diverse reasoning paths and a summarized final answer. See docs to learn more.
Evaluation
Testing Data, Factors & Metrics
Testing Data
Evaluated on:
- AIME 2024
- AIME 2025
- AMC 2023
- MATH-500
Factors
Performance was disaggregated by:
- Number of parallel paths (2, 4, 8)
- Token budget (16K per path for ParaThinker)
- Task complexity
Metrics
- Pass@1: Accuracy of the first generated answer.
Results
Benchmark | Sequential (32K) | ParaThinker-1.5B (2×16K) | ParaThinker-1.5B (4×16K) | ParaThinker-1.5B (8×16K) |
---|---|---|---|---|
AIME 2024 | 28.3% | 34.8% | 43.3% | 48.1% |
AIME 2025 | 20.5% | 24.2% | 26.7% | 31.9% |
AMC 2023 | 72.5% | 73.1% | 80.8% | 83.1% |
MATH-500 | 85.0% | 87.5% | 88.7% | 89.7% |
Average | 50.9% | 54.9% | 59.9% | 63.2% |
See Section 5 of the paper for more details.
Model Card Contact 📧
Send an email to 8208220105@csu.edu.cn.
- Downloads last month
- 51
Model tree for Leslie04/ParaThinker-1.5B
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B