arxiv:2508.19282

CORE-RAG: Lossless Compression for Retrieval-Augmented LLMs via Reinforcement Learning

Published on Aug 24

Authors:

Abstract

CORE, a novel end-to-end method for lossless context compression in Retrieval-Augmented Generation (RAG), enhances task performance and reduces computational costs without degrading accuracy.

AI-generated summary

Retrieval-Augmented Generation (RAG) has emerged as a promising approach to enhance the timeliness of knowledge updates and the factual accuracy of responses in large language models. However, incorporating a large number of retrieved documents significantly increases input length, leading to higher computational costs. Existing approaches to document compression tailored for RAG often degrade task performance, as they typically rely on predefined heuristics in the absence of clear compression guidelines. These heuristics fail to ensure that the compressed content effectively supports downstream tasks. To address these limitations, we propose CORE, a novel method for lossless context compression in RAG. CORE is optimized end-to-end and does not depend on predefined compression labels, which are often impractical to obtain. Instead, it leverages downstream task performance as a feedback signal, iteratively refining the compression policy to enhance task effectiveness. Extensive experiments across four datasets demonstrate the effectiveness of CORE. With a high compression ratio of 3%, CORE not only prevents performance degradation compared to including full documents (i.e., without compression) but also improves the average Exact Match (EM) score by 3.3 points. The code for CORE will be released soon.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.19282 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.19282 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.19282 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.