Papers
arxiv:2601.13836

FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

Published on Jan 20
ยท Submitted by
Fu Jinlan
on Jan 21
ยท OpenMOSS-Team OpenMOSS
Authors:
,
,
,

Abstract

FutureOmni presents the first benchmark for evaluating multimodal models' ability to forecast future events from audio-visual data, revealing current limitations and proposing an improved training strategy for better performance.

AI-generated summary

Although Multimodal Large Language Models (MLLMs) demonstrate strong omni-modal perception, their ability to forecast future events from audio-visual cues remains largely unexplored, as existing benchmarks focus mainly on retrospective understanding. To bridge this gap, we introduce FutureOmni, the first benchmark designed to evaluate omni-modal future forecasting from audio-visual environments. The evaluated models are required to perform cross-modal causal and temporal reasoning, as well as effectively leverage internal knowledge to predict future events. FutureOmni is constructed via a scalable LLM-assisted, human-in-the-loop pipeline and contains 919 videos and 1,034 multiple-choice QA pairs across 8 primary domains. Evaluations on 13 omni-modal and 7 video-only models show that current systems struggle with audio-visual future prediction, particularly in speech-heavy scenarios, with the best accuracy of 64.8% achieved by Gemini 3 Flash. To mitigate this limitation, we curate a 7K-sample instruction-tuning dataset and propose an Omni-Modal Future Forecasting (OFF) training strategy. Evaluations on FutureOmni and popular audio-visual and video-only benchmarks demonstrate that OFF enhances future forecasting and generalization. We publicly release all code (https://github.com/OpenMOSS/FutureOmni) and datasets (https://huggingface.co/datasets/OpenMOSS-Team/FutureOmni).

Community

Paper author Paper submitter

FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

๐Ÿ”— Paper: https://arxiv.org/pdf/2601.13836
๐Ÿ’ป Code: https://github.com/OpenMOSS/FutureOmni
๐ŸŒ Project: https://openmoss.github.io/FutureOmni
๐ŸŽฌ Datasets: https://huggingface.co/datasets/OpenMOSS-Team/FutureOmni

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.13836 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.13836 in a Space README.md to link it from this page.

Collections including this paper 1