Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models Paper • 2509.23962 • Published 20 days ago • 5
Rethinking Entropy Regularization in Large Reasoning Models Paper • 2509.25133 • Published 19 days ago • 3
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions Paper • 2510.08211 • Published 9 days ago • 22
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration Paper • 2509.14760 • Published about 1 month ago • 52
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 140
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs Paper • 2407.10058 • Published Jul 14, 2024 • 31