Feature Extraction
Transformers
Safetensors
English
GAR
custom_code

GAR-1B

This repository contains the GAR-1B model, as presented in the paper Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs.

TL; DR: Our Grasp Any Region (GAR) supports both (1) describing a single region of an image or a video in the form of points/boxes/scribbles/masks in detail and (2) understanding multiple regions such as modeling interactions and performing complex reasoning. We also release a new benchmark, GARBench, to evaluate models on advanced region-level understanding tasks.

Usage

For detailed usage of this model, please refer to our GitHub repo.

Downloads last month
4,286
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for HaochenWang/GAR-1B

Finetuned
(1)
this model

Dataset used to train HaochenWang/GAR-1B

Spaces using HaochenWang/GAR-1B 2

Collection including HaochenWang/GAR-1B