SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law Paper • 2507.18576 • Published Jul 24 • 6
Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Step Paper • 2509.23924 • Published 20 days ago • 7
Rethinking Entropy Regularization in Large Reasoning Models Paper • 2509.25133 • Published 19 days ago • 3
Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models Paper • 2509.23962 • Published 20 days ago • 5
Beyond External Monitors: Enhancing Transparency of Large Language Models for Easier Monitoring Paper • 2502.05242 • Published Feb 7