FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos Paper • 2512.10927 • Published 20 days ago • 5
FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos Paper • 2512.10927 • Published 20 days ago • 5
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models Paper • 2512.07843 • Published Nov 24, 2025 • 20
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22, 2025 • 63
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22, 2025 • 63
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding Paper • 2501.16411 • Published Jan 27, 2025 • 19
Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26, 2024 • 32
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition Paper • 2403.14148 • Published Mar 21, 2024 • 21
Driving Everywhere with Large Language Model Policy Adaptation Paper • 2402.05932 • Published Feb 8, 2024 • 5
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models Paper • 2305.13655 • Published May 23, 2023 • 7