Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization Paper • 2605.13641 • Published 4 days ago • 2
AdaR Collection An effective framework for mathematical data synthesis to improve model's robustness and generalization. • 3 items • Updated Oct 11, 2025 • 3
Libra: Assessing and Improving Reward Model by Learning to Think Paper • 2507.21645 • Published Jul 29, 2025 • 3