Model Card for ParaThinker-1.5B

ParaThinker-1.5B is a 1.5 billion parameter language model designed for efficient mathematical reasoning through native parallel thinking. Built upon the DeepSeek-R1-Distill-Qwen-1.5B base model, it introduces specialized training to support up to 8 parallel reasoning paths thinking, leveraging KV-cache reuse via PagedAttention in vLLM. This model is detailed in our paper: ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Model Description

ParaThinker-1.5B enhances small-scale LLMs by enabling parallel reasoning paths with minimal latency overhead. It uses special tokens (<think1> ~ <think8>) to boost thought diversity and a summarization template for coherent final answers. The model excels in math reasoning tasks, achieving much higher accuracy than sequential baselines on benchmarks like AIME.

Model Sources

Uses

Direct Use

ParaThinker-1.5B is intended for mathematical reasoning tasks, such as solving problems from AIME, AMC, or MATH-500 datasets. It can be used directly with the vLLM-based ParaThinker inference engine to generate diverse reasoning paths and a summarized final answer. See docs to learn more.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Evaluated on:

  • AIME 2024
  • AIME 2025
  • AMC 2023
  • MATH-500

Factors

Performance was disaggregated by:

  • Number of parallel paths (2, 4, 8)
  • Token budget (16K per path for ParaThinker)
  • Task complexity

Metrics

  • Pass@1: Accuracy of the first generated answer.

Results

Benchmark Sequential (32K) ParaThinker-1.5B (2×16K) ParaThinker-1.5B (4×16K) ParaThinker-1.5B (8×16K)
AIME 2024 28.3% 34.8% 43.3% 48.1%
AIME 2025 20.5% 24.2% 26.7% 31.9%
AMC 2023 72.5% 73.1% 80.8% 83.1%
MATH-500 85.0% 87.5% 88.7% 89.7%
Average 50.9% 54.9% 59.9% 63.2%

See Section 5 of the paper for more details.

Model Card Contact 📧

Send an email to 8208220105@csu.edu.cn.

Downloads last month
51
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Leslie04/ParaThinker-1.5B

Finetuned
(483)
this model