PromptCoT-2.0-SelfPlay-30B-A3B

This model is part of PromptCoT 2.0 (Scaling Prompt Synthesis for LLM Reasoning).
It is a 30B-A3B model trained via self-play, where synthesized problems from PromptCoT 2.0 provide verifiable feedback (unit tests for code, boxed answers for math).
The training loop uses Direct Preference Optimization (DPO) to align generations with automatically verified outcomes, removing the dependence on stronger external teachers.

This model achieves state-of-the-art performance at the 30B scale, competitive with closed-source models such as Gemini 2.5 Pro and OpenAI o3.


โœจ Highlights

  • Self-Play Training:
    The model improves autonomously using synthetic math & code problems generated by PromptCoT 2.0.
    Positive/negative pairs are constructed from verifiable feedback signals (unit test success / final answer correctness).

  • Competitive with Closed-Source Models:
    Despite activating only 3B parameters, this model achieves results comparable to Gemini 2.5 Pro and OpenAI o3 across both math and code.


๐Ÿ“Š Results

PromptCoT-2.0 Self-Play 30B-A3B Results

Performance of PromptCoT-2.0-SelfPlay-30B-A3B on six benchmarks (AIME24/25, HMMT Feb25, LiveCodeBench v5/v6, Codeforces). The model achieves competitive results with Gemini 2.5 Pro and OpenAI o3, while surpassing strong open-source baselines.


๐Ÿ”ฎ Key Takeaways

  • Math + Code reasoning: Strong, balanced gains across both Olympiad-level math (AIME, HMMT) and competitive programming (LiveCodeBench, Codeforces).
  • Efficient scaling: Uses 3B activated parameters for self-play fine-tuning, making it significantly more efficient than comparable closed-source models.

๐Ÿ“‚ Resources


๐Ÿ“œ Citation

If you find this model useful, please consider citing:

@article{zhao2025promptcot2,
  title     = {PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning},
  author    = {Zhao, Xueliang and Wu, Wei and Guan, Jian and Gong, Zhuocheng and Kong, Lingpeng},
  journal   = {arXiv preprint arXiv:2509.19894},
  year      = {2025},
  url       = {https://arxiv.org/abs/2509.19894}
}
Downloads last month
11
Safetensors
Model size
211k params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for xl-zhao/PromptCoT-2.0-SelfPlay-30B-A3B

Finetuned
(12)
this model
Quantizations
1 model

Collection including xl-zhao/PromptCoT-2.0-SelfPlay-30B-A3B