Update README.md
Browse files
README.md
CHANGED
@@ -251,30 +251,30 @@ Source code for SFT and RFT training is provided — see [GitHub](https://github
|
|
251 |
|
252 |
### Grounding Benchmark
|
253 |
|
254 |
-
| Model
|
255 |
-
|
256 |
-
| **AgentCPM-GUI-8B**
|
257 |
-
| Qwen2.5-VL-7B
|
258 |
-
| Intern2.5-VL-8B
|
259 |
-
| Intern2.5-VL-26B
|
260 |
-
| OS-Genesis-7B
|
261 |
-
| UI-TARS-7B
|
262 |
-
| OS-
|
263 |
-
| Aguvis-7B
|
264 |
-
| GPT-4o
|
265 |
-
| GPT-4o with Grounding
|
266 |
|
267 |
### Agent Benchmark
|
268 |
|
269 |
-
| Dataset | Android Control-Low TM | Android Control-Low EM | Android Control-High TM | Android Control-High EM | GUI-Odyssey TM | GUI-Odyssey EM | AITZ TM | AITZ EM | Chinese APP TM | Chinese APP EM |
|
270 |
| ------------------------- | ---------------------- | ---------------------- | ----------------------- | ----------------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- |
|
271 |
-
| **AgentCPM-GUI-8B** |
|
272 |
-
| Qwen2.5-VL-7B |
|
273 |
-
| UI-TARS-7B |
|
274 |
| OS-Genesis-7B | 90.74 | 74.22 | 65.92 | 44.43 | 11.67 | 3.63 | 19.98 | 8.45 | 38.10 | 14.50 |
|
275 |
-
| OS-Atlas-7B
|
276 |
| Aguvis-7B | 93.85 | 89.40 | 65.56 | 54.18 | 26.71 | 13.54 | 35.71 | 18.99 | 67.43 | 38.20 |
|
277 |
-
| OdysseyAgent-7B | 65.10 | 39.16 | 58.80 | 32.74 | 90.83 | 73.67 | 59.17 | 31.60 | 67.56 | 25.44 |
|
278 |
| GPT-4o | - | 19.49 | - | 20.80 | - | 20.39 | 70.00 | 35.30 | 3.67 | 3.67 |
|
279 |
| Gemini 2.0 | - | 28.50 | - | 60.20 | - | 3.27 | - | - | - | - |
|
280 |
| Claude | - | 19.40 | - | 12.50 | 60.90 | - | - | - | - | - |
|
|
|
251 |
|
252 |
### Grounding Benchmark
|
253 |
|
254 |
+
| Model | Fun2Point | Text2Point | Bbox2Text | Average |
|
255 |
+
|-------------------------|-----------|------------|-----------|--------|
|
256 |
+
| **AgentCPM-GUI-8B** | **79.1** | **76.5** | **58.2** |**71.3**|
|
257 |
+
| Qwen2.5-VL-7B | 59.8 | 59.3 | <ins>50.0</ins> | <ins>56.4</ins> |
|
258 |
+
| Intern2.5-VL-8B | 17.2 | 24.2 | 45.9 | 29.1 |
|
259 |
+
| Intern2.5-VL-26B | 14.8 | 16.6 | 36.3 | 22.6 |
|
260 |
+
| OS-Genesis-7B | 8.3 | 5.8 | 4.0 | 6.0 |
|
261 |
+
| UI-TARS-7B | 56.8 | <ins>66.7</ins> | 1.4 | 41.6 |
|
262 |
+
| OS-Atlas-7B | 53.6 | 60.7 | 0.4 | 38.2 |
|
263 |
+
| Aguvis-7B | <ins>60.8</ins> | **76.5** | 0.2 | 45.8 |
|
264 |
+
| GPT-4o | 22.1 | 19.9 | 14.3 | 18.8 |
|
265 |
+
| GPT-4o with Grounding | 44.3 | 44.0 | 14.3 | 44.2 |
|
266 |
|
267 |
### Agent Benchmark
|
268 |
|
269 |
+
| Dataset | Android Control-Low TM | Android Control-Low EM | Android Control-High TM | Android Control-High EM | GUI-Odyssey TM | GUI-Odyssey EM | AITZ TM | AITZ EM | Chinese APP (CAGUI) TM | Chinese APP (CAGUI) EM |
|
270 |
| ------------------------- | ---------------------- | ---------------------- | ----------------------- | ----------------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- |
|
271 |
+
| **AgentCPM-GUI-8B** | <ins>94.39</ins> | <ins>90.20</ins> | <ins>77.70</ins> | <ins>69.17</ins> | **90.85** | **74.96** | **85.71** | **76.38** | **96.86** | **91.28** |
|
272 |
+
| Qwen2.5-VL-7B | 94.14 | 84.96 | 75.10 | 62.90 | 59.54 | 46.28 | 78.41 | 54.61 | 74.18 | 55.16 |
|
273 |
+
| UI-TARS-7B | **95.24** | **91.79** | **81.63** | **74.43** | 86.06 | 67.90 | <ins>80.42</ins> | <ins>65.77</ins> | <ins>88.62</ins> | <ins>70.26</ins> |
|
274 |
| OS-Genesis-7B | 90.74 | 74.22 | 65.92 | 44.43 | 11.67 | 3.63 | 19.98 | 8.45 | 38.10 | 14.50 |
|
275 |
+
| OS-Atlas-7B | 73.03 | 67.25 | 70.36 | 56.53 | 91.83* | 76.76* | 74.13 | 58.45 | 81.53 | 55.89 |
|
276 |
| Aguvis-7B | 93.85 | 89.40 | 65.56 | 54.18 | 26.71 | 13.54 | 35.71 | 18.99 | 67.43 | 38.20 |
|
277 |
+
| OdysseyAgent-7B | 65.10 | 39.16 | 58.80 | 32.74 | <ins>90.83</ins> | <ins>73.67</ins> | 59.17 | 31.60 | 67.56 | 25.44 |
|
278 |
| GPT-4o | - | 19.49 | - | 20.80 | - | 20.39 | 70.00 | 35.30 | 3.67 | 3.67 |
|
279 |
| Gemini 2.0 | - | 28.50 | - | 60.20 | - | 3.27 | - | - | - | - |
|
280 |
| Claude | - | 19.40 | - | 12.50 | 60.90 | - | - | - | - | - |
|