arxiv:2510.11170

EAGER: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling

Published on Oct 13

· Submitted by

Daniel Scalena on Oct 16

Cohere Labs

Upvote

Authors:

Daniel Scalena ,

Abstract

EAGer, a training-free method, uses token-wise entropy to optimize computational resources and improve performance on complex reasoning tasks.

AI-generated summary

With the rise of reasoning language models and test-time scaling methods as a paradigm for improving model performance, substantial computation is often required to generate multiple candidate sequences from the same prompt. This enables exploration of different reasoning paths toward the correct solution, however, allocates the same compute budget for each prompt. Grounded on the assumption that different prompts carry different degrees of complexity, and thus different computation needs, we propose EAGer, a training-free generation method that leverages model uncertainty through token-wise entropy distribution to reduce redundant computation and concurrently improve overall performance. EAGer allows branching to multiple reasoning paths only in the presence of high-entropy tokens, and then reallocates the saved compute budget to the instances where exploration of alternative paths is most needed. We find that across multiple open-source models on complex reasoning benchmarks such as AIME 2025, EAGer can reallocate the budget without accessing target labels, achieving the best efficiency-performance trade-off in terms of reasoning length and Pass@k. When target labels are accessible, EAGer generates up to 65% fewer tokens (hence saving compute) and achieves up to 37% improvement in Pass@k compared to the Full Parallel Sampling.

View arXiv page View PDF GitHub 0 Add to collection

Community

DanielSc4

Paper author Paper submitter 2 days ago

We present EAGer 🧠, showing that we can be MORE efficient & MORE effective by letting models focus compute where it matters most. EAGer dynamically allocates compute in LLMs by monitoring token-level uncertainty.

Results show EAGer reduces token usage by up to 80% and boosts performance by 13% (no labels) and up to 37% (with labels) on reasoning benchmarks like AIME, shifting the Pareto frontier across 3B-20B models.

How? We track token entropy during generation: high entropy (uncertainty) triggers branching to explore new reasoning paths, while low entropy continues a single path. This allows EAGer to reuse budget efficiently, capping at M sequences per prompt. On easy prompts, compute is saved; on hard prompts (those hitting the cap), saved budget is reallocated automatically, no labels or retraining needed! The full version of EAGer even uses task failures (if available) to better target struggling prompts.

librarian-bot

1 day ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.11170 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.11170 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.11170 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.