SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs Paper • 2506.05344 • Published Jun 5 • 16
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published Feb 6 • 30
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 26
Unleashing Text-to-Image Diffusion Models for Visual Perception Paper • 2303.02153 • Published Mar 3, 2023
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published Sep 19, 2024 • 26
Efficient Inference of Vision Instruction-Following Models with Elastic Cache Paper • 2407.18121 • Published Jul 25, 2024 • 17