Rethinking Chain-of-Thought Reasoning for Videos Paper β’ 2512.09616 β’ Published 17 days ago β’ 17
CoRe^2: Collect, Reflect and Refine to Generate Better and Faster Paper β’ 2503.09662 β’ Published Mar 12 β’ 33
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding Paper β’ 2412.00493 β’ Published Nov 30, 2024 β’ 17
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning Paper β’ 2412.03248 β’ Published Dec 4, 2024 β’ 26
Runtime error Featured 515 Florence2 + SAM2 π₯ 515 Segment and caption objects in images and videos
VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control Paper β’ 2308.09804 β’ Published Aug 18, 2023 β’ 2