Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.12276

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Paper • 2510.12276 • Published 4 days ago • 135

Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles

Paper • 2505.23590 • Published May 29 • 25
How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning

Paper • 2505.24273 • Published May 30 • 4
Accelerating Diffusion LLMs via Adaptive Parallel Decoding

Paper • 2506.00413 • Published May 31 • 9
DINGO: Constrained Inference for Diffusion LLMs

Paper • 2505.23061 • Published May 29 • 31

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Paper • 2411.19650 • Published Nov 29, 2024
Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published May 20, 2024 • 29
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression

Paper • 2412.03293 • Published Dec 4, 2024

about 3 hours ago

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published 5 days ago • 139
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Paper • 2510.12276 • Published 4 days ago • 135
FlashWorld: High-quality 3D Scene Generation within Seconds

Paper • 2510.13678 • Published 3 days ago • 62
ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints

Paper • 2510.14847 • Published 2 days ago • 48

Bugai's Collection

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28 • 89
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24 • 80
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26 • 41
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 34
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 27
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 126
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 22

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

about 3 hours ago

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published 5 days ago • 139
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Paper • 2510.12276 • Published 4 days ago • 135
FlashWorld: High-quality 3D Scene Generation within Seconds

Paper • 2510.13678 • Published 3 days ago • 62
ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints

Paper • 2510.14847 • Published 2 days ago • 48

Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model

Paper • 2510.12276 • Published 4 days ago • 135

Bugai's Collection

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28 • 89
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24 • 80
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26 • 41
VibeVoice Technical Report

Paper • 2508.19205 • Published Aug 26 • 123

Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles

Paper • 2505.23590 • Published May 29 • 25
How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning

Paper • 2505.24273 • Published May 30 • 4
Accelerating Diffusion LLMs via Adaptive Parallel Decoding

Paper • 2506.00413 • Published May 31 • 9
DINGO: Constrained Inference for Diffusion LLMs

Paper • 2505.23061 • Published May 29 • 31

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 34
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion

Paper • 2409.11406 • Published Sep 17, 2024 • 27
Diffusion Models Are Real-Time Game Engines

Paper • 2408.14837 • Published Aug 27, 2024 • 126
Segment Anything with Multiple Modalities

Paper • 2408.09085 • Published Aug 17, 2024 • 22

OpenVLA: An Open-Source Vision-Language-Action Model

Paper • 2406.09246 • Published Jun 13, 2024 • 41
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

Paper • 2411.19650 • Published Nov 29, 2024
Octo: An Open-Source Generalist Robot Policy

Paper • 2405.12213 • Published May 20, 2024 • 29
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression

Paper • 2412.03293 • Published Dec 4, 2024

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs