OmniRoam: World Wandering via Long-Horizon Panoramic Video Generation
Abstract
OmniRoam enables long-horizon panoramic video generation with improved scene completeness and consistency through a two-stage approach combining trajectory-controlled preview and refinement stages.
Modeling scenes using video generation models has garnered growing research interest in recent years. However, most existing approaches rely on perspective video models that synthesize only limited observations of a scene, leading to issues of completeness and global consistency. We propose OmniRoam, a controllable panoramic video generation framework that exploits the rich per-frame scene coverage and inherent long-term spatial and temporal consistency of panoramic representation, enabling long-horizon scene wandering. Our framework begins with a preview stage, where a trajectory-controlled video generation model creates a quick overview of the scene from a given input image or video. Then, in the refine stage, this video is temporally extended and spatially upsampled to produce long-range, high-resolution videos, thus enabling high-fidelity world wandering. To train our model, we introduce two panoramic video datasets that incorporate both synthetic and real-world captured videos. Experiments show that our framework consistently outperforms state-of-the-art methods in terms of visual quality, controllability, and long-term scene consistency, both qualitatively and quantitatively. We further showcase several extensions of this framework, including real-time video generation and 3D reconstruction. Code is available at https://github.com/yuhengliu02/OmniRoam.
Community
Project Page: https://yuheng.ink/project-page/omniroam/
Github: https://github.com/yuhengliu02/OmniRoam
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Toward Physically Consistent Driving Video World Models under Challenging Trajectories (2026)
- From Static to Dynamic: Exploring Self-supervised Image-to-Video Representation Transfer Learning (2026)
- MemCam: Memory-Augmented Camera Control for Consistent Video Generation (2026)
- Stepper: Stepwise Immersive Scene Generation with Multiview Panoramas (2026)
- UCM: Unifying Camera Control and Memory with Time-aware Positional Encoding Warping for World Models (2026)
- CamDirector: Towards Long-Term Coherent Video Trajectory Editing (2026)
- PhysVideo: Physically Plausible Video Generation with Cross-View Geometry Guidance (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2603.30045 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper