--- license: apache-2.0 language: - en base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --- # Model Card for ParaThinker-1.5B ParaThinker-1.5B is a 1.5 billion parameter language model designed for efficient mathematical reasoning through native parallel thinking. Built upon the DeepSeek-R1-Distill-Qwen-1.5B base model, it introduces specialized training to support up to 8 parallel reasoning paths thinking, leveraging KV-cache reuse via PagedAttention in vLLM. This model is detailed in our paper: [ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute](https://arxiv.org/pdf/2509.04475) This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). ## Model Details ### Model Description ParaThinker-1.5B enhances small-scale LLMs by enabling parallel reasoning paths with minimal latency overhead. It uses special tokens (`` ~ ``) to boost thought diversity and a summarization template for coherent final answers. The model excels in math reasoning tasks, achieving much higher accuracy than sequential baselines on benchmarks like AIME. - **Developed by:** Hao Wen, Yifan Su et al. - **Model type:** Causal Language Model - **Language(s) (NLP):** English - **License:** Apache 2.0 - **Finetuned from model:** [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) ### Model Sources - **Github Repository:** https://github.com/MobileLLM/ParaThinker - **Paper:** https://arxiv.org/pdf/2509.04475 ## Uses ### Direct Use ParaThinker-1.5B is intended for mathematical reasoning tasks, such as solving problems from AIME, AMC, or MATH-500 datasets. It can be used directly with the vLLM-based ParaThinker inference engine to generate diverse reasoning paths and a summarized final answer. See [docs](https://github.com/MobileLLM/ParaThinker) to learn more. ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data Evaluated on: - AIME 2024 - AIME 2025 - AMC 2023 - MATH-500 #### Factors Performance was disaggregated by: - Number of parallel paths (2, 4, 8) - Token budget (16K per path for ParaThinker) - Task complexity #### Metrics - **Pass@1**: Accuracy of the first generated answer. ### Results | Benchmark | Sequential (32K) | ParaThinker-1.5B (2×16K) | ParaThinker-1.5B (4×16K) | ParaThinker-1.5B (8×16K) | | ----------- | ---------------- | ------------------------ | ------------------------ | ------------------------ | | AIME 2024 | 28.3% | 34.8% | 43.3% | 48.1% | | AIME 2025 | 20.5% | 24.2% | 26.7% | 31.9% | | AMC 2023 | 72.5% | 73.1% | 80.8% | 83.1% | | MATH-500 | 85.0% | 87.5% | 88.7% | 89.7% | | **Average** | **50.9%** | **54.9%** | **59.9%** | **63.2%** | See Section 5 of the [paper](https://arxiv.org/abs/2509.04475) for more details. ## Model Card Contact 📧 Send an email to 8208220105@csu.edu.cn.