SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction Paper β’ 2507.15852 β’ Published 18 days ago β’ 37
4KAgent: Agentic Any Image to 4K Super-Resolution Paper β’ 2507.07105 β’ Published about 1 month ago β’ 96
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second Paper β’ 2507.10065 β’ Published 26 days ago β’ 24
MOSPA: Human Motion Generation Driven by Spatial Audio Paper β’ 2507.11949 β’ Published 24 days ago β’ 23
FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers Paper β’ 2507.12956 β’ Published 23 days ago β’ 22
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens Paper β’ 2506.17218 β’ Published Jun 20 β’ 27
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency Paper β’ 2506.08343 β’ Published Jun 10 β’ 49
Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model Paper β’ 2506.13642 β’ Published Jun 16 β’ 27
Frame Guidance: Training-Free Guidance for Frame-Level Control in Video Diffusion Models Paper β’ 2506.07177 β’ Published Jun 8 β’ 22
Seeing Voices: Generating A-Roll Video from Audio with Mirage Paper β’ 2506.08279 β’ Published Jun 9 β’ 28
view article Article π¦Έπ»#14: What Is MCP, and Why Is Everyone β Suddenly!β Talking About It? By Kseniase β’ Mar 17 β’ 325
CVPR 2025 Collection A collection of models and demos linked to papers presented at CVPR 2025. β’ 14 items β’ Updated Jun 11 β’ 1
Peer-Ranked Precision: Creating a Foundational Dataset for Fine-Tuning Vision Models from DataSeeds' Annotated Imagery Paper β’ 2506.05673 β’ Published Jun 6 β’ 10
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers Paper β’ 2506.05573 β’ Published Jun 5 β’ 77
SpatialLM: Training Large Language Models for Structured Indoor Modeling Paper β’ 2506.07491 β’ Published Jun 9 β’ 42