Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
spillai 's Collections
agents
vision-r1

vision-r1

updated 23 days ago
Upvote
1

  • Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models

    Paper • 2503.06749 • Published Mar 9 • 31

  • Executable Code Actions Elicit Better LLM Agents

    Paper • 2402.01030 • Published Feb 1, 2024 • 172

  • VGR: Visual Grounded Reasoning

    Paper • 2506.11991 • Published Jun 13 • 19

  • Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images

    Paper • 2509.07966 • Published Sep 9 • 4

  • VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

    Paper • 2504.15279 • Published Apr 21 • 76

  • Visual Abstract Thinking Empowers Multimodal Reasoning

    Paper • 2505.20164 • Published May 26 • 1

  • PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured Images

    Paper • 2509.25185 • Published 26 days ago • 4

  • Seeing Culture: A Benchmark for Visual Reasoning and Grounding

    Paper • 2509.16517 • Published Sep 20 • 1

  • VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

    Paper • 2509.21100 • Published about 1 month ago • 1

  • Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles

    Paper • 2505.23590 • Published May 29 • 25
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs