Unveiling Visual Biases in Audio-Visual Localization Benchmarks Paper • 2409.06709 • Published Aug 25, 2024
POLYCHARTQA: Benchmarking Large Vision-Language Models with Multilingual Chart Question Answering Paper • 2507.11939 • Published Jul 16 • 1
ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models Paper • 2510.10606 • Published 19 days ago • 3
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning Paper • 2510.20286 • Published 8 days ago • 21
ChartM$^3$: Benchmarking Chart Editing with Multimodal Instructions Paper • 2507.21167 • Published Jul 25 • 1
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning Paper • 2510.20286 • Published 8 days ago • 21