Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 98
One RL to See Them All: Visual Triple Unified Reinforcement Learning Paper • 2505.18129 • Published May 23 • 60
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published Mar 20 • 52
Performance Trade-offs of Optimizing Small Language Models for E-Commerce Paper • 2510.21970 • Published Oct 24 • 2
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning Paper • 2510.25992 • Published 28 days ago • 44
Generalizing Test-time Compute-optimal Scaling as an Optimizable Graph Paper • 2511.00086 • Published 28 days ago • 41
RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization Paper • 2511.04285 • Published 21 days ago • 7
RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments Paper • 2511.07317 • Published 16 days ago • 12
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published 18 days ago • 119
Adaptive Multi-Agent Response Refinement in Conversational Systems Paper • 2511.08319 • Published 16 days ago • 40
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism Paper • 2511.11373 • Published 13 days ago • 11
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published 9 days ago • 128
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning Paper • 2511.14460 • Published 9 days ago • 15