Collections
Discover the best community collections!
Collections including paper arXiv:2510.23763
-
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
Paper • 2510.23763 • Published • 52 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 86 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 134 -
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Paper • 2510.13747 • Published • 29
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 29 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
OpenVLA: An Open-Source Vision-Language-Action Model
Paper • 2406.09246 • Published • 41 -
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Paper • 2411.19650 • Published -
Octo: An Open-Source Generalist Robot Policy
Paper • 2405.12213 • Published • 29 -
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
Paper • 2412.03293 • Published
-
Emu3.5: Native Multimodal Models are World Learners
Paper • 2510.26583 • Published • 100 -
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging
Paper • 2510.20479 • Published • 10 -
A Definition of AGI
Paper • 2510.18212 • Published • 33 -
Video-As-Prompt: Unified Semantic Control for Video Generation
Paper • 2510.20888 • Published • 44
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
Emu3.5: Native Multimodal Models are World Learners
Paper • 2510.26583 • Published • 100 -
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging
Paper • 2510.20479 • Published • 10 -
A Definition of AGI
Paper • 2510.18212 • Published • 33 -
Video-As-Prompt: Unified Semantic Control for Video Generation
Paper • 2510.20888 • Published • 44
-
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
Paper • 2510.23763 • Published • 52 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 86 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 134 -
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Paper • 2510.13747 • Published • 29
-
Gemini Robotics: Bringing AI into the Physical World
Paper • 2503.20020 • Published • 29 -
Magma: A Foundation Model for Multimodal AI Agents
Paper • 2502.13130 • Published • 58 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Paper • 2410.23218 • Published • 49
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 57 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 52 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 44 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 63
-
OpenVLA: An Open-Source Vision-Language-Action Model
Paper • 2406.09246 • Published • 41 -
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Paper • 2411.19650 • Published -
Octo: An Open-Source Generalist Robot Policy
Paper • 2405.12213 • Published • 29 -
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression
Paper • 2412.03293 • Published