leondawn666
's Collections
Agent & RL
updated
Towards General-Purpose Model-Free Reinforcement Learning
Paper
•
2501.16142
•
Published
•
30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
•
2503.14476
•
Published
•
142
Does Reinforcement Learning Really Incentivize Reasoning Capacity in
LLMs Beyond the Base Model?
Paper
•
2504.13837
•
Published
•
136
Learning to Reason under Off-Policy Guidance
Paper
•
2504.14945
•
Published
•
88
ToolRL: Reward is All Tool Learning Needs
Paper
•
2504.13958
•
Published
•
48
TTRL: Test-Time Reinforcement Learning
Paper
•
2504.16084
•
Published
•
120
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning
Paper
•
2504.16656
•
Published
•
57
Reinforcement Learning for Reasoning in Large Language Models with One
Training Example
Paper
•
2504.20571
•
Published
•
98
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations
Paper
•
2504.10481
•
Published
•
85
Rethinking Reflection in Pre-Training
Paper
•
2504.04022
•
Published
•
79
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Paper
•
2504.10479
•
Published
•
304
Advances and Challenges in Foundation Agents: From Brain-Inspired
Intelligence to Evolutionary, Collaborative, and Safe Systems
Paper
•
2504.01990
•
Published
•
300
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
•
2503.24290
•
Published
•
62
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper
•
2505.17612
•
Published
•
81
ARM: Adaptive Reasoning Model
Paper
•
2505.20258
•
Published
•
45
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
•
2505.03335
•
Published
•
187
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
•
2505.22617
•
Published
•
131
Reinforcement Pre-Training
Paper
•
2506.08007
•
Published
•
262
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance
Software Engineering?
Paper
•
2502.12115
•
Published
•
46
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
•
2505.24726
•
Published
•
275
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
•
2502.05171
•
Published
•
151
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
•
2501.17161
•
Published
•
123
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
Paper
•
2502.07316
•
Published
•
50
WebShaper: Agentically Data Synthesizing via Information-Seeking
Formalization
Paper
•
2507.15061
•
Published
•
60
A Survey of Context Engineering for Large Language Models
Paper
•
2507.13334
•
Published
•
259
MemOS: A Memory OS for AI System
Paper
•
2507.03724
•
Published
•
156
Agentic Reinforced Policy Optimization
Paper
•
2507.19849
•
Published
•
156
Deep Researcher with Test-Time Diffusion
Paper
•
2507.16075
•
Published
•
67
A Survey of Self-Evolving Agents: On Path to Artificial Super
Intelligence
Paper
•
2507.21046
•
Published
•
81
Group Sequence Policy Optimization
Paper
•
2507.18071
•
Published
•
312
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens
Paper
•
2508.01191
•
Published
•
237
On the Generalization of SFT: A Reinforcement Learning Perspective with
Reward Rectification
Paper
•
2508.05629
•
Published
•
179
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from
Experience
Paper
•
2508.04700
•
Published
•
52
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with
Long-Term Memory
Paper
•
2508.09736
•
Published
•
57
SSRL: Self-Search Reinforcement Learning
Paper
•
2508.10874
•
Published
•
96
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent
Distillation and Agentic RL
Paper
•
2508.13167
•
Published
•
127
Sharing is Caring: Efficient LM Post-Training with Collective RL
Experience Sharing
Paper
•
2509.08721
•
Published
•
661
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with
Verifiable Rewards via Monte Carlo Tree Search
Paper
•
2509.25454
•
Published
•
137
Less is More: Recursive Reasoning with Tiny Networks
Paper
•
2510.04871
•
Published
•
487
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
•
2509.02547
•
Published
•
224
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
•
2509.08827
•
Published
•
188
Agent Learning via Early Experience
Paper
•
2510.08558
•
Published
•
265
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper
•
2510.07242
•
Published
•
30
Multi-Agent Tool-Integrated Policy Optimization
Paper
•
2510.04678
•
Published
•
30
Agentic Context Engineering: Evolving Contexts for Self-Improving
Language Models
Paper
•
2510.04618
•
Published
•
120
RLP: Reinforcement as a Pretraining Objective
Paper
•
2510.01265
•
Published
•
40
It Takes Two: Your GRPO Is Secretly DPO
Paper
•
2510.00977
•
Published
•
31
DCPO: Dynamic Clipping Policy Optimization
Paper
•
2509.02333
•
Published
•
21
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper
•
2510.13786
•
Published
•
30
What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity
Paper
•
2511.15593
•
Published
•
54
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling
Paper
•
2511.11793
•
Published
•
154
P1: Mastering Physics Olympiads with Reinforcement Learning
Paper
•
2511.13612
•
Published
•
128
LightRAG: Simple and Fast Retrieval-Augmented Generation
Paper
•
2410.05779
•
Published
•
19
OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists
Paper
•
2511.16931
•
Published
•
6
Budget-Aware Tool-Use Enables Effective Agent Scaling
Paper
•
2511.17006
•
Published
•
22