Rethinking Entropy Regularization in Large Reasoning Models Paper • 2509.25133 • Published 19 days ago • 3
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense Paper • 2510.07242 • Published 10 days ago • 30