Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks
Abstract
Large Language Models face challenges in long-horizon agentic tasks as their constrained memory is easily overwhelmed by distracting or irrelevant context. Existing working memory methods typically rely on external, heuristic mechanisms that are decoupled from the agent's core policy. In this work, we reframe working memory management as a learnable, intrinsic capability. We propose a novel framework, Memory-as-Action, where an agent actively manages its working memory by executing explicit editing operations as part of a unified policy. This formulation allows an agent, trained via reinforcement learning, to balance memory curation against long-term task objectives under given resource constraints. However, such memory editing actions break the standard assumption of a continuously growing prefix in LLM interactions, leading to what we call trajectory fractures. These non-prefix changes disrupt the causal continuity required by standard policy gradient methods, making those methods inapplicable. To address this, we propose a new algorithm, Dynamic Context Policy Optimization, which enables stable end-to-end reinforcement learning by segmenting trajectories at memory action points and applying trajectory-level advantages to the resulting action segments. Our results demonstrate that jointly optimizing for task reasoning and memory management in an end-to-end fashion not only reduces overall computational consumption but also improves task performance, driven by adaptive context curation strategies tailored to the model's intrinsic capabilities.
Community
Large Language Models face challenges in long-horizon agentic tasks as their
constrained memory is easily overwhelmed by distracting or irrelevant context.
Existing working memory methods typically rely on external, heuristic mecha-
nisms that are decoupled from the agent’s core policy. In this work, we reframe
working memory management as a learnable, intrinsic capability. We propose a
novel framework, Memory-as-Action, where an agent actively manages its work-
ing memory by executing explicit editing operations as part of a unified policy.
This formulation allows an agent, trained via reinforcement learning, to balance
memory curation against long-term task objectives under given resource con-
straints. However, such memory editing actions break the standard assumption
of a continuously growing prefix in LLM interactions, leading to what we call
trajectory fractures. These non-prefix changes disrupt the causal continuity re-
quired by standard policy gradient methods, making those methods inapplicable.
To address this, we propose a new algorithm, Dynamic Context Policy Optimiza-
tion, which enables stable end-to-end reinforcement learning by segmenting tra-
jectories at memory action points and applying trajectory-level advantages to the
resulting action segments. Our results demonstrate that jointly optimizing for task
reasoning and memory management in an end-to-end fashion not only reduces
overall computational consumption but also improves task performance, driven
by adaptive context curation strategies tailored to the model’s intrinsic capabili-
ties.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Mem-{\alpha}: Learning Memory Construction via Reinforcement Learning (2025)
- Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning (2025)
- Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents (2025)
- In-the-Flow Agentic System Optimization for Effective Planning and Tool Use (2025)
- Encouraging Good Processes Without the Need for Good Answers: Reinforcement Learning for LLM Agent Planning (2025)
- Beyond Outcome Reward: Decoupling Search and Answering Improves LLM Agents (2025)
- SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper