openai/whisper-large-v3-turbo Automatic Speech Recognition • 0.8B • Updated Oct 4, 2024 • 4.63M • • 2.72k
RynnVLA-002: A Unified Vision-Language-Action and World Model Paper • 2511.17502 • Published 17 days ago • 24
Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight Paper • 2511.16175 • Published 19 days ago • 12