OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks Paper • 2604.08539 • Published 4 days ago • 44
MARS: Enabling Autoregressive Models Multi-Token Generation Paper • 2604.07023 • Published 5 days ago • 33
Experience Transfer for Multimodal LLM Agents in Minecraft Game Paper • 2604.05533 • Published 6 days ago • 13
360Anything: Geometry-Free Lifting of Images and Videos to 360° Paper • 2601.16192 • Published Jan 22 • 9
Running 41 Image Upscaler And Restoring GFPGAN Algorithm 🦀 41 Enhance and upscale images using GFPGAN