A comprehensive framework designed to cultivate VLMs with human-like visuospatial abilities.
Ray Yang
rayruiyang
AI & ML interests
None yet
Recent Activity
upvoted a paper about 18 hours ago
ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners? upvoted a paper 3 days ago
MolmoPoint: Better Pointing for VLMs with Grounding Tokens upvoted a paper 8 days ago
ProAct: Agentic Lookahead in Interactive EnvironmentsOrganizations
None yet