VLA Models - a sikang99 Collection

sikang99 's Collections

SLAM

Diffusion Models

Diffusion Model

Reinforcement Learning

Vision Processing

Video Generation

VLA Models

updated 11 days ago

Vision Language Models for Robotics

Unified Vision-Language-Action Model

Paper • 2506.19850 • Published Jun 24 • 27
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 140
3D-VLA: A 3D Vision-Language-Action Generative World Model

Paper • 2403.09631 • Published Mar 14, 2024 • 10
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots

Paper • 2312.14457 • Published Dec 22, 2023 • 1
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression

Paper • 2412.03293 • Published Dec 4, 2024
Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations

Paper • 2405.06039 • Published May 9, 2024 • 1
A Dual Process VLA: Efficient Robotic Manipulation Leveraging VLM

Paper • 2410.15549 • Published Oct 21, 2024
What Can RL Bring to VLA Generalization? An Empirical Study

Paper • 2505.19789 • Published May 26 • 1
VLA-Cache: Towards Efficient Vision-Language-Action Model via Adaptive Token Caching in Robotic Manipulation

Paper • 2502.02175 • Published Feb 4
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models

Paper • 2506.17561 • Published Jun 21
RaceVLA: VLA-based Racing Drone Navigation with Human-like Behaviour

Paper • 2503.02572 • Published Mar 4
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning

Paper • 2505.18719 • Published May 24
RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

Paper • 2506.18088 • Published Jun 22 • 18
WorldVLA: Towards Autoregressive Action World Model

Paper • 2506.21539 • Published Jun 26 • 39
RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration

Paper • 2505.03673 • Published May 6 • 1
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete

Paper • 2502.21257 • Published Feb 28 • 2
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2 • 38
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Paper • 2503.16365 • Published Mar 20 • 40
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers

Paper • 2507.01016 • Published Jul 1 • 1
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding

Paper • 2506.13725 • Published Jun 16 • 1
UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent

Paper • 2501.18867 • Published Jan 31
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation

Paper • 2505.03912 • Published May 6 • 9
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation

Paper • 2505.22159 • Published May 28
DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

Paper • 2507.04447 • Published Jul 6 • 44
RoboBrain 2.0 Technical Report

Paper • 2507.02029 • Published Jul 2 • 32
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model

Paper • 2503.10631 • Published Mar 13
GR-3 Technical Report

Paper • 2507.15493 • Published Jul 21 • 47
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos

Paper • 2507.15597 • Published Jul 21 • 33
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Paper • 2507.16815 • Published Jul 22 • 39
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models

Paper • 2507.23682 • Published Jul 31 • 23
StreamVLN: Streaming Vision-and-Language Navigation via SlowFast Context Modeling

Paper • 2507.05240 • Published Jul 7 • 47
MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11 • 43
Shortcut Learning in Generalist Robot Policies: The Role of Dataset Diversity and Fragmentation

Paper • 2508.06426 • Published Aug 8 • 10
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published Aug 27 • 29
F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Paper • 2509.06951 • Published Sep 8 • 31
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Paper • 2509.09674 • Published Sep 11 • 77
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes

Paper • 2509.06266 • Published Sep 8 • 10
3D Aware Region Prompted Vision Language Model

Paper • 2509.13317 • Published Sep 16 • 12
VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators

Paper • 2510.00406 • Published 18 days ago • 63
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

Paper • 2510.01623 • Published 16 days ago • 7