Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas Paper • 2503.01773 • Published Mar 3, 2025
Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging Paper • 2505.05464 • Published May 8, 2025 • 11
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning Paper • 2504.20073 • Published Apr 24, 2025 • 12
Non-Sequential Graph Script Induction via Multimedia Grounding Paper • 2305.17542 • Published May 27, 2023 • 1
The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination Paper • 2502.16143 • Published Feb 22, 2025 • 6
Re-thinking Temporal Search for Long-Form Video Understanding Paper • 2504.02259 • Published Apr 3, 2025 • 1
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos Paper • 2411.11409 • Published Nov 18, 2024
COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation Paper • 2007.00576 • Published Jul 1, 2020 • 1
Multimedia Generative Script Learning for Task Planning Paper • 2208.12306 • Published Aug 25, 2022 • 2
Conv-CoA: Improving Open-domain Question Answering in Large Language Models via Conversational Chain-of-Action Paper • 2405.17822 • Published May 28, 2024