SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2 • 126
3D-VLA: A 3D Vision-Language-Action Generative World Model Paper • 2403.09631 • Published Mar 14, 2024 • 10
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots Paper • 2312.14457 • Published Dec 22, 2023 • 1
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression Paper • 2412.03293 • Published Dec 4, 2024
Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations Paper • 2405.06039 • Published May 9, 2024 • 1
A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM Paper • 2410.15549 • Published Oct 21, 2024
VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation Paper • 2502.02175 • Published Feb 4
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models Paper • 2506.17561 • Published Jun 21
RaceVLA: VLA-based Racing Drone Navigation with Human-like Behaviour Paper • 2503.02572 • Published Mar 4
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning Paper • 2505.18719 • Published May 24
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation Paper • 2506.18088 • Published Jun 22 • 17
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration Paper • 2505.03673 • Published May 6 • 1
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete Paper • 2502.21257 • Published Feb 28 • 2
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective Paper • 2507.01925 • Published Jul 2 • 35
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 41
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers Paper • 2507.01016 • Published Jul 1 • 1
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding Paper • 2506.13725 • Published Jun 16
UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent Paper • 2501.18867 • Published Jan 31
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation Paper • 2505.03912 • Published May 6 • 8
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation Paper • 2505.22159 • Published May 28
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge Paper • 2507.04447 • Published Jul 6 • 42
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model Paper • 2503.10631 • Published Mar 13
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos Paper • 2507.15597 • Published 19 days ago • 33
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published 18 days ago • 35
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Paper • 2507.23682 • Published 9 days ago • 22
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling Paper • 2507.05240 • Published Jul 7 • 45