Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published 14 days ago • 57
AgentCPM-Report: Interleaving Drafting and Deepening for Open-Ended Deep Research Paper • 2602.06540 • Published 20 days ago • 21
Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment Paper • 2402.19085 • Published Feb 29, 2024
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16, 2025 • 40
Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning Paper • 2506.07851 • Published Jun 9, 2025
DARC: Decoupled Asymmetric Reasoning Curriculum for LLM Evolution Paper • 2601.13761 • Published Jan 20 • 16
Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification Paper • 2601.21244 • Published 29 days ago • 12
Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification Paper • 2601.21244 • Published 29 days ago • 12
Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification Paper • 2601.21244 • Published 29 days ago • 12 • 4
Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification Paper • 2601.21244 • Published 29 days ago • 12