VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models Paper • 2511.11007 • Published 22 days ago • 15
Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models Paper • 2511.17487 • Published 14 days ago • 9
VisPlay: Self-Evolving Vision-Language Models from Images Paper • 2511.15661 • Published 16 days ago • 42