JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent Paper • 2506.17612 • Published Jun 21 • 62
Pixels, Patterns, but No Poetry: To See The World like Humans Paper • 2507.16863 • Published 18 days ago • 66
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published 20 days ago • 123
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios Paper • 2507.20198 • Published 12 days ago • 24
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving Paper • 2507.23726 • Published 8 days ago • 101
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning Paper • 2507.12841 • Published 22 days ago • 39
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published 22 days ago • 70
SENTINEL Collection [ICCV 2025] Official repository of "Mitigating Object Hallucinations via Sentence-Level Early Intervention". Repo: https://github.com/pspdada/SENTINEL • 9 items • Updated 18 days ago • 4
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 210
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning Paper • 2506.05331 • Published Jun 5 • 13
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning Paper • 2505.12081 • Published May 17 • 18
Training-Free Efficient Video Generation via Dynamic Token Carving Paper • 2505.16864 • Published May 22 • 22