microsoft
/

GUI-Actor-Verifier-2B

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions Community

qianhuiwu commited on 4 days ago

Commit

4c2ce70

·

verified ·

1 Parent(s): 3960674

highlight verifier numbers.

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -60,8 +60,8 @@ Table 2. Main results on the ScreenSpot-Pro and ScreenSpot-v2 with **Qwen2.5-VL*
 | **_3B models:_**
 | Qwen2.5-VL-3B  | Qwen2.5-VL    | 25.9           | 80.9           |
 | Jedi-3B        | Qwen2.5-VL    | 36.1           | 88.6           |
-| GUI-Actor-3B   | Qwen2.5-VL    | **42.2**       | **91.0**       |
-| GUI-Actor-3B + Verifier   | Qwen2.5-VL    | 45.9       | 92.4       |
 ## 🚀 Usage
 The verifier takes a language instruction and an image with a red circle marking the target position as input. One example is shown below. It outputs either ‘True’ or ‘False’, and you can also use the probability of each label to score the sample.

 | **_3B models:_**
 | Qwen2.5-VL-3B  | Qwen2.5-VL    | 25.9           | 80.9           |
 | Jedi-3B        | Qwen2.5-VL    | 36.1           | 88.6           |
+| GUI-Actor-3B   | Qwen2.5-VL    | 42.2       | 91.0       |
+| GUI-Actor-3B + Verifier   | Qwen2.5-VL    | **45.9**       | **92.4**       |
 ## 🚀 Usage
 The verifier takes a language instruction and an image with a red circle marking the target position as input. One example is shown below. It outputs either ‘True’ or ‘False’, and you can also use the probability of each label to score the sample.