ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping
Abstract
ARES, a unified framework for adaptive reasoning, dynamically adjusts exploration effort based on task difficulty using high window-entropy tokens and hierarchical entropy rewards, improving performance and efficiency across various benchmarks.
Recent advances in multimodal large reasoning models (MLRMs) have substantially improved their ability to solve complex textual and visual tasks. However, these models tend to overthink on simple problems, producing unnecessarily lengthy reasoning traces, while under-exploring on challenging ones, leading to missed solutions. To address this imbalance, we propose ARES, a unified open-source framework for adaptive reasoning that dynamically allocates exploration effort based on task difficulty. Our approach is motivated by two key empirical findings: (i) while single-token entropy is noisy, high window-entropy (HWE) tokens (token-level entropies averaged under a sliding window) can reliably capture reasoning-critical moments; and (ii) reducing HWE usage benefits easy problems, while increasing it is essential for solving hard ones. Building on these insights, ARES introduces a two-stage training pipeline. In the Adaptive Cold-Start stage, we curate multimodal and textual data paired with reasoning traces of length proportional to problem difficulty, equipping the model with initial difficulty awareness. In the second stage, we develop Adaptive Entropy Policy Optimization (AEPO), which uses HWE tokens as exploration triggers to decide when to explore, and a hierarchical entropy reward with dynamic KL control to decide how much to explore. Extensive experiments demonstrate that ARES achieves superior performance and reasoning efficiency across diverse mathematical, logical, and multimodal benchmarks, while closing the gap to leading commercial systems under significantly lower inference costs.
Community
Paper(arXiv): https://arxiv.org/abs/2510.08457
Github: https://github.com/shawn0728/ARES
Model & Dataset (hugging face): https://huggingface.co/collections/ares0728/ares-68e7c7160dcb48734dee4e95
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code (2025)
- Beyond Token Length: Step Pruner for Efficient and Accurate Reasoning in Large Language Models (2025)
- THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning (2025)
- DRQA: Dynamic Reasoning Quota Allocation for Controlling Overthinking in Reasoning Large Language Models (2025)
- BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens (2025)
- Empowering Lightweight MLLMs with Reasoning via Long CoT SFT (2025)
- HiPO: Hybrid Policy Optimization for Dynamic Reasoning in LLMs (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper