Fidelity-Aware Data Composition for Robust Robot Generalization
Abstract
Coherent Information Fidelity Tuning (CIFT) improves out-of-distribution generalization in robot policies by optimizing data composition with a generative engine, enhancing robustness and performance.
Generalist robot policies trained on large-scale, visually homogeneous datasets can be susceptible to shortcut learning, which impairs their out-of-distribution (OOD) generalization. While generative data augmentation is a common approach to introduce diversity, it presents a subtle challenge: data composition. Naively mixing real and synthetic data can corrupt the learning signal, as this process often prioritizes visual diversity at the expense of information fidelity. This paper suggests that robust generalization depends on principled, fidelity-aware data composition. We introduce Coherent Information Fidelity Tuning (CIFT), a framework that treats data composition as an optimization problem. CIFT uses a practical proxy for Information Fidelity based on the feature-space geometry of a dataset. This enables the identification of a phase transition, termed the Decoherence Point, where training stability degrades. The framework includes a generative engine, Multi-View Video Augmentation (MVAug), to synthesize a causally disentangled data spectrum for this tuning process. Applying CIFT to policy architectures such as pi_0 and Diffusion Policy improves OOD success rates by over 54\%. These results indicate that fidelity-aware composition, beyond data synthesis alone, is an important component for developing robust, general-purpose robots.
Community
Fidelity-Aware Data Composition for Robust Robot Generalization
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer (2025)
- Learning Primitive Embodied World Models: Towards Scalable Robotic Learning (2025)
- StaMo: Unsupervised Learning of Generalizable Robot Motion from Compact State Representation (2025)
- Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance (2025)
- VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators (2025)
- OmniD: Generalizable Robot Manipulation Policy via Image-Based BEV Representation (2025)
- VGGT-DP: Generalizable Robot Control via Vision Foundation Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper