arxiv:2510.14211

LiteStage: Latency-aware Layer Skipping for Multi-stage Reasoning

Published on Oct 16

· Submitted by

Jiwon Song on Oct 17

Seoul National University

Upvote

Authors:

Jiwon Song ,

Abstract

LiteStage, a latency-aware layer skipping framework, enhances multi-stage reasoning by optimizing layer budgets and suppressing redundant output tokens, achieving significant speedup with minimal accuracy loss.

AI-generated summary

Multi-stage reasoning has emerged as an effective strategy for enhancing the reasoning capability of small language models by decomposing complex problems into sequential sub-stages. However, this comes at the cost of increased latency. We observe that existing adaptive acceleration techniques, such as layer skipping, struggle to balance efficiency and accuracy in this setting due to two key challenges: (1) stage-wise variation in skip sensitivity, and (2) the generation of redundant output tokens. To address these, we propose LiteStage, a latency-aware layer skipping framework for multi-stage reasoning. LiteStage combines a stage-wise offline search that allocates optimal layer budgets with an online confidence-based generation early exit to suppress unnecessary decoding. Experiments on three benchmarks, e.g., OBQA, CSQA, and StrategyQA, show that LiteStage achieves up to 1.70x speedup with less than 4.0% accuracy loss, outperforming prior training-free layer skipping methods.

View arXiv page View PDF GitHub 0 Add to collection

Community

jiwonsong

Paper author Paper submitter 1 day ago

LiteStage is a latency-aware layer skipping framework for multi-stage reasoning that balances efficiency and accuracy by combining stage-wise optimization with online early exiting, achieving up to 1.7× speedup with less than 4% accuracy loss.

librarian-bot

about 11 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.14211 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.14211 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.14211 in a Space README.md to link it from this page.