Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning Paper • 2510.03259 • Published 22 days ago • 54
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense Paper • 2510.07242 • Published 10 days ago • 30
First Try Matters: Revisiting the Role of Reflection in Reasoning Models Paper • 2510.08308 • Published 9 days ago • 24
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published 15 days ago • 43
Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States Paper • 2510.11052 • Published 5 days ago • 47
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment Paper • 2510.10201 • Published 7 days ago • 35
Demystifying Reinforcement Learning in Agentic Reasoning Paper • 2510.11701 • Published 5 days ago • 27
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published 5 days ago • 154