Model Card for ParaThinker-1.5B

ParaThinker-1.5B is a 1.5 billion parameter language model designed for efficient mathematical reasoning through native parallel thinking. Built upon the DeepSeek-R1-Distill-Qwen-1.5B base model, it introduces specialized training to support up to 8 parallel reasoning paths thinking, leveraging KV-cache reuse via PagedAttention in vLLM. This model is detailed in our paper: ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Model Description

ParaThinker-1.5B enhances small-scale LLMs by enabling parallel reasoning paths with minimal latency overhead. It uses special tokens (<think1> ~ <think8>) to boost thought diversity and a summarization template for coherent final answers. The model excels in math reasoning tasks, achieving much higher accuracy than sequential baselines on benchmarks like AIME.

Developed by: Hao Wen, Yifan Su et al.
Model type: Causal Language Model
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Model Sources

Github Repository: https://github.com/MobileLLM/ParaThinker
Paper: https://arxiv.org/pdf/2509.04475

Uses

Direct Use

ParaThinker-1.5B is intended for mathematical reasoning tasks, such as solving problems from AIME, AMC, or MATH-500 datasets. It can be used directly with the vLLM-based ParaThinker inference engine to generate diverse reasoning paths and a summarized final answer. See docs to learn more.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluated on:

AIME 2024
AIME 2025
AMC 2023
MATH-500

Factors

Performance was disaggregated by:

Number of parallel paths (2, 4, 8)
Token budget (16K per path for ParaThinker)
Task complexity

Metrics

Pass@1: Accuracy of the first generated answer.

Results

Benchmark	Sequential (32K)	ParaThinker-1.5B (2×16K)	ParaThinker-1.5B (4×16K)	ParaThinker-1.5B (8×16K)
AIME 2024	28.3%	34.8%	43.3%	48.1%
AIME 2025	20.5%	24.2%	26.7%	31.9%
AMC 2023	72.5%	73.1%	80.8%	83.1%
MATH-500	85.0%	87.5%	88.7%	89.7%
Average	50.9%	54.9%	59.9%	63.2%

See Section 5 of the paper for more details.

Model Card Contact 📧

Send an email to 8208220105@csu.edu.cn.

Downloads last month: 51

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Leslie04/ParaThinker-1.5B

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Finetuned

(483)

this model