UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning Paper • 2510.20286 • Published 8 days ago • 21
ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models Paper • 2510.10606 • Published 19 days ago • 3
DreamOmni2: Multimodal Instruction-based Editing and Generation Paper • 2510.06679 • Published 24 days ago • 74
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published 18 days ago • 168
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech Paper • 2509.25131 • Published Sep 29 • 14
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17 • 75
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning Paper • 2505.12081 • Published May 17 • 18
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement Paper • 2503.06520 • Published Mar 9 • 11
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition Paper • 2412.09501 • Published Dec 12, 2024 • 48
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 118