R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation
Abstract
A real-to-real 3D data generation framework enhances data efficiency for generalized robotic manipulation by augmenting pointcloud observations without simulation.
Towards the aim of generalized robotic manipulation, spatial generalization is the most fundamental capability that requires the policy to work robustly under different spatial distribution of objects, environment and agent itself. To achieve this, substantial human demonstrations need to be collected to cover different spatial configurations for training a generalized visuomotor policy via imitation learning. Prior works explore a promising direction that leverages data generation to acquire abundant spatially diverse data from minimal source demonstrations. However, most approaches face significant sim-to-real gap and are often limited to constrained settings, such as fixed-base scenarios and predefined camera viewpoints. In this paper, we propose a real-to-real 3D data generation framework (R2RGen) that directly augments the pointcloud observation-action pairs to generate real-world data. R2RGen is simulator- and rendering-free, thus being efficient and plug-and-play. Specifically, given a single source demonstration, we introduce an annotation mechanism for fine-grained parsing of scene and trajectory. A group-wise augmentation strategy is proposed to handle complex multi-object compositions and diverse task constraints. We further present camera-aware processing to align the distribution of generated data with real-world 3D sensor. Empirically, R2RGen substantially enhances data efficiency on extensive experiments and demonstrates strong potential for scaling and application on mobile manipulation.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Parse-Augment-Distill: Learning Generalizable Bimanual Visuomotor Policies from Single Human Video (2025)
- Learning in ImaginationLand: Omnidirectional Policies through 3D Generative Models (OP-Gen) (2025)
- MoTo: A Zero-shot Plug-in Interaction-aware Navigation for General Mobile Manipulation (2025)
- DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation (2025)
- ROPA: Synthetic Robot Pose Generation for RGB-D Bimanual Data Augmentation (2025)
- MobRT: A Digital Twin-Based Framework for Scalable Learning in Mobile Manipulation (2025)
- LodeStar: Long-horizon Dexterity via Synthetic Data Augmentation from Human Demonstrations (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper