SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference Paper • 2410.04417 • Published Oct 6, 2024 • 1
EVA: An Embodied World Model for Future Video Anticipation Paper • 2410.15461 • Published Oct 20, 2024
Unveiling the Tapestry of Consistency in Large Vision-Language Models Paper • 2405.14156 • Published May 23, 2024
WoW: Towards a World omniscient World model Through Embodied Interaction Paper • 2509.22642 • Published Sep 26 • 14
Robobench: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models as Embodied Brain Paper • 2510.17801 • Published Oct 20
WoW: Towards a World omniscient World model Through Embodied Interaction Paper • 2509.22642 • Published Sep 26 • 14
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 117