GENIUS: Generative Fluid Intelligence Evaluation Suite Paper • 2602.11144 • Published 26 days ago • 53
GEBench: Benchmarking Image Generation Models as GUI Environments Paper • 2602.09007 • Published 28 days ago • 39
Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning Paper • 2601.21037 • Published Jan 28 • 15
How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing Paper • 2602.01851 • Published Feb 2 • 16
How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing Paper • 2602.01851 • Published Feb 2 • 16