Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arXiv:2510.23763

Proactive Robot Manipulation in Omni-modal Context

fnlp/RoboOmni

Robotics • Updated 8 days ago • 55 • 5
fnlp/RoboOmni-LIBERO-Spatial

Robotics • Updated 7 days ago • 45
fnlp/RoboOmni-LIBERO-Goal

Updated 9 days ago • 26
fnlp/RoboOmni-LIBERO-Object

Updated 9 days ago • 19

RoboOmni: Proactive Robot Manipulation in Omni-modal Context

Paper • 2510.23763 • Published 10 days ago • 52
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published 20 days ago • 86
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22 • 134
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

Paper • 2510.13747 • Published 23 days ago • 29

Multimodal Agent

about 20 hours ago

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Paper • 2411.19650 • Published Nov 29, 2024
Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published May 20, 2024 • 29
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression

Paper • 2412.03293 • Published Dec 4, 2024

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published 8 days ago • 100
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging

Paper • 2510.20479 • Published 15 days ago • 10
A Definition of AGI

Paper • 2510.18212 • Published 17 days ago • 33
Video-As-Prompt: Unified Semantic Control for Video Generation

Paper • 2510.20888 • Published 14 days ago • 44

A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning

Paper • 2509.15937 • Published Sep 19 • 20
RoboOmni: Proactive Robot Manipulation in Omni-modal Context

Paper • 2510.23763 • Published 10 days ago • 52

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 57
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 44
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 63

Proactive Robot Manipulation in Omni-modal Context

fnlp/RoboOmni

Robotics • Updated 8 days ago • 55 • 5
fnlp/RoboOmni-LIBERO-Spatial

Robotics • Updated 7 days ago • 45
fnlp/RoboOmni-LIBERO-Goal

Updated 9 days ago • 26
fnlp/RoboOmni-LIBERO-Object

Updated 9 days ago • 19

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published 8 days ago • 100
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging

Paper • 2510.20479 • Published 15 days ago • 10
A Definition of AGI

Paper • 2510.18212 • Published 17 days ago • 33
Video-As-Prompt: Unified Semantic Control for Video Generation

Paper • 2510.20888 • Published 14 days ago • 44

RoboOmni: Proactive Robot Manipulation in Omni-modal Context

Paper • 2510.23763 • Published 10 days ago • 52
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published 20 days ago • 86
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22 • 134
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue

Paper • 2510.13747 • Published 23 days ago • 29

A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning

Paper • 2509.15937 • Published Sep 19 • 20
RoboOmni: Proactive Robot Manipulation in Omni-modal Context

Paper • 2510.23763 • Published 10 days ago • 52

Multimodal Agent

about 20 hours ago

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 57
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 52
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 44
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 63

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Paper • 2411.19650 • Published Nov 29, 2024
Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published May 20, 2024 • 29
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression

Paper • 2412.03293 • Published Dec 4, 2024

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs