DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset Paper • 2601.10305 • Published 5 days ago • 35
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better Paper • 2506.09040 • Published Jun 10, 2025 • 34
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps Paper • 2505.18675 • Published May 24, 2025 • 26
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Paper • 2504.17432 • Published Apr 24, 2025 • 40
Decoupled Global-Local Alignment for Improving Compositional Understanding Paper • 2504.16801 • Published Apr 23, 2025 • 14