Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.16719

about 3 hours ago

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 16 days ago • 105

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 16 days ago • 105

MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models

Paper • 2511.18373 • Published 13 days ago • 5
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published 19 days ago • 17
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published 12 days ago • 26
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 16 days ago • 105

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Paper • 2406.04151 • Published Jun 6, 2024 • 24
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Paper • 2510.16872 • Published Oct 19 • 103
Scaling Generalist Data-Analytic Agents

Paper • 2509.25084 • Published Sep 29 • 18
Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16 • 117

FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 72
rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28 • 116
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Paper • 2508.16279 • Published Aug 22 • 53
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Paper • 2509.12201 • Published Sep 15 • 104

Papers I want to read 🗞️

Machine Learning Operations (MLOps): Overview, Definition, and Architecture

Paper • 2205.02302 • Published May 4, 2022 • 1
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 16 days ago • 105

MedSAM3: Delving into Segment Anything with Medical Concepts

Paper • 2511.19046 • Published 12 days ago • 48
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 16 days ago • 105

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published 17 days ago • 222
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Paper • 2511.15065 • Published 17 days ago • 74
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 16 days ago • 105
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published 10 days ago • 32

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9 • 109
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165
Spotlight on Token Perception for Multimodal Reinforcement Learning

Paper • 2510.09285 • Published Oct 10 • 36
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Paper • 2510.17354 • Published Oct 20 • 33

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14 • 18
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19 • 48

about 3 hours ago

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 16 days ago • 105

Papers I want to read 🗞️

Machine Learning Operations (MLOps): Overview, Definition, and Architecture

Paper • 2205.02302 • Published May 4, 2022 • 1
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 16 days ago • 105

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 16 days ago • 105

MedSAM3: Delving into Segment Anything with Medical Concepts

Paper • 2511.19046 • Published 12 days ago • 48
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 16 days ago • 105

MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models

Paper • 2511.18373 • Published 13 days ago • 5
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published 19 days ago • 17
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Paper • 2511.19418 • Published 12 days ago • 26
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 16 days ago • 105

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published 17 days ago • 222
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks

Paper • 2511.15065 • Published 17 days ago • 74
SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 16 days ago • 105
Canvas-to-Image: Compositional Image Generation with Multimodal Controls

Paper • 2511.21691 • Published 10 days ago • 32

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Paper • 2406.04151 • Published Jun 6, 2024 • 24
DeepAnalyze: Agentic Large Language Models for Autonomous Data Science

Paper • 2510.16872 • Published Oct 19 • 103
Scaling Generalist Data-Analytic Agents

Paper • 2509.25084 • Published Sep 29 • 18
Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16 • 117

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9 • 109
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165
Spotlight on Token Perception for Multimodal Reinforcement Learning

Paper • 2510.09285 • Published Oct 10 • 36
Towards Mixed-Modal Retrieval for Universal Retrieval-Augmented Generation

Paper • 2510.17354 • Published Oct 20 • 33

FastVLM: Efficient Vision Encoding for Vision Language Models

Paper • 2412.13303 • Published Dec 17, 2024 • 72
rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28 • 116
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications

Paper • 2508.16279 • Published Aug 22 • 53
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling

Paper • 2509.12201 • Published Sep 15 • 104

Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations

Paper • 2508.09789 • Published Aug 13 • 5
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14 • 18
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents

Paper • 2508.04038 • Published Aug 6 • 1
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published Aug 19 • 48

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs