SeeUPO: Sequence-Level Agentic-RL with Convergence Guarantees Paper • 2602.06554 • Published 22 days ago • 5
Weak-Driven Learning: How Weak Agents make Strong Agents Stronger Paper • 2602.08222 • Published 20 days ago • 272
Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published 18 days ago • 195
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published 24 days ago • 341
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning Paper • 2602.10560 • Published 18 days ago • 28
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation Paper • 2602.12125 • Published 16 days ago • 57
Rethinking the Trust Region in LLM Reinforcement Learning Paper • 2602.04879 • Published 24 days ago • 35
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published 28 days ago • 41
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models Paper • 2601.22060 • Published about 1 month ago • 157
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published 29 days ago • 204
Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published Dec 18, 2025 • 36
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published Dec 22, 2025 • 64
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics Paper • 2512.21010 • Published Dec 24, 2025 • 4
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models Paper • 2512.19995 • Published Dec 23, 2025 • 16
Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards Paper • 2512.21625 • Published Dec 25, 2025 • 4
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization Paper • 2512.24615 • Published Dec 31, 2025 • 119