Budget-aware Test-time Scaling via Discriminative Verification
Abstract
A hybrid approach combining discriminative verification with self-consistency outperforms generative verification in test-time scaling for large language models, achieving higher accuracy within a fixed compute budget.
Test-time scaling is a powerful strategy for boosting the performance of large language models on complex reasoning tasks. While state-of-the-art approaches often employ generative verifiers to select the best solution from a pool of candidates, this method incurs prohibitive computational costs, limiting its practicality. In this work, we shift the focus to a more budget-aware paradigm: discriminative verification. We conduct a thorough empirical analysis and demonstrate that while discriminative verifiers may underperform in isolation, combining them with self-consistency in a hybrid approach creates a powerful and efficient test-time scaling mechanism. Notably, under a fixed compute budget, this hybrid approach surpasses state-of-the-art generative verification by a significant margin: achieving up to 15.3\% higher accuracy on AIME2025. Our findings establish that for practical, real-world applications, budget-aware scaling with discriminative verifiers is not only a "free" upgrade over self-consistency, but also a more effective and efficient alternative to costly generative techniques. Code is available at https://github.com/wang-research-lab/verification.
Community
This work studies how discriminative verification can enable efficient test-time scaling. When combined with self-consistency, it yields strong performance gains, often surpassing generative verification methods under practical compute budgets.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers (2025)
- Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency (2025)
- Trust but Verify! A Survey on Verification Design for Test-time Scaling (2025)
- LATTS: Locally Adaptive Test-Time Scaling (2025)
- Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs (2025)
- Not All Bits Are Equal: Scale-Dependent Memory Optimization Strategies for Reasoning Models (2025)
- Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper