zhong-zhang commited on
Commit
a72ead9
·
verified ·
1 Parent(s): b6299fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -18
README.md CHANGED
@@ -251,30 +251,30 @@ Source code for SFT and RFT training is provided — see [GitHub](https://github
251
 
252
  ### Grounding Benchmark
253
 
254
- | Model | fun2point | text2point | bbox2text | average |
255
- | ------------------------- | -------------- | -------------- | -------------- | -------------- |
256
- | **AgentCPM-GUI-8B** | **79.1** | **76.5** | **58.2** | **71.3** |
257
- | Qwen2.5-VL-7B | 59.8 | 59.3 | 50.0 | 56.4 |
258
- | Intern2.5-VL-8B | 17.2 | 24.2 | 45.9 | 29.1 |
259
- | Intern2.5-VL-26B | 14.8 | 16.6 | 36.3 | 22.6 |
260
- | OS-Genesis-7B | 8.3 | 5.8 | 4.0 | 6.0 |
261
- | UI-TARS-7B | 56.8 | 66.7 | 1.4 | 41.6 |
262
- | OS-Altas-7B | 53.6 | 60.7 | 0.4 | 38.2 |
263
- | Aguvis-7B | 60.8 | **76.5** | 0.2 | 45.8 |
264
- | GPT-4o | 22.1 | 19.9 | 14.3 | 18.8 |
265
- | GPT-4o with Grounding | 44.3 | 44.0 | 14.3 | 44.2 |
266
 
267
  ### Agent Benchmark
268
 
269
- | Dataset | Android Control-Low TM | Android Control-Low EM | Android Control-High TM | Android Control-High EM | GUI-Odyssey TM | GUI-Odyssey EM | AITZ TM | AITZ EM | Chinese APP TM | Chinese APP EM |
270
  | ------------------------- | ---------------------- | ---------------------- | ----------------------- | ----------------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- |
271
- | **AgentCPM-GUI-8B** | **94.39** | **90.20** | **77.70** | **69.17** | **90.85** | **74.96** | **85.71** | **76.38** | **96.86** | **91.28** |
272
- | Qwen2.5-VL-7B | 92.11 | 82.12 | 69.65 | 57.36 | 55.33 | 40.90 | 73.16 | 57.58 | 68.53 | 48.80 |
273
- | UI-TARS-7B | 93.52 | 88.89 | 68.53 | 60.81 | 78.79 | 57.33 | 71.74 | 55.31 | 71.01 | 53.92 |
274
  | OS-Genesis-7B | 90.74 | 74.22 | 65.92 | 44.43 | 11.67 | 3.63 | 19.98 | 8.45 | 38.10 | 14.50 |
275
- | OS-Atlas-7B | 73.03 | 67.25 | 70.36 | 56.53 | 91.83* | 76.76* | 74.13 | 58.45 | 81.53 | 55.89 |
276
  | Aguvis-7B | 93.85 | 89.40 | 65.56 | 54.18 | 26.71 | 13.54 | 35.71 | 18.99 | 67.43 | 38.20 |
277
- | OdysseyAgent-7B | 65.10 | 39.16 | 58.80 | 32.74 | 90.83 | 73.67 | 59.17 | 31.60 | 67.56 | 25.44 |
278
  | GPT-4o | - | 19.49 | - | 20.80 | - | 20.39 | 70.00 | 35.30 | 3.67 | 3.67 |
279
  | Gemini 2.0 | - | 28.50 | - | 60.20 | - | 3.27 | - | - | - | - |
280
  | Claude | - | 19.40 | - | 12.50 | 60.90 | - | - | - | - | - |
 
251
 
252
  ### Grounding Benchmark
253
 
254
+ | Model | Fun2Point | Text2Point | Bbox2Text | Average |
255
+ |-------------------------|-----------|------------|-----------|--------|
256
+ | **AgentCPM-GUI-8B** | **79.1** | **76.5** | **58.2** |**71.3**|
257
+ | Qwen2.5-VL-7B | 59.8 | 59.3 | <ins>50.0</ins> | <ins>56.4</ins> |
258
+ | Intern2.5-VL-8B | 17.2 | 24.2 | 45.9 | 29.1 |
259
+ | Intern2.5-VL-26B | 14.8 | 16.6 | 36.3 | 22.6 |
260
+ | OS-Genesis-7B | 8.3 | 5.8 | 4.0 | 6.0 |
261
+ | UI-TARS-7B | 56.8 | <ins>66.7</ins> | 1.4 | 41.6 |
262
+ | OS-Atlas-7B | 53.6 | 60.7 | 0.4 | 38.2 |
263
+ | Aguvis-7B | <ins>60.8</ins> | **76.5** | 0.2 | 45.8 |
264
+ | GPT-4o | 22.1 | 19.9 | 14.3 | 18.8 |
265
+ | GPT-4o with Grounding | 44.3 | 44.0 | 14.3 | 44.2 |
266
 
267
  ### Agent Benchmark
268
 
269
+ | Dataset | Android Control-Low TM | Android Control-Low EM | Android Control-High TM | Android Control-High EM | GUI-Odyssey TM | GUI-Odyssey EM | AITZ TM | AITZ EM | Chinese APP (CAGUI) TM | Chinese APP (CAGUI) EM |
270
  | ------------------------- | ---------------------- | ---------------------- | ----------------------- | ----------------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- |
271
+ | **AgentCPM-GUI-8B** | <ins>94.39</ins> | <ins>90.20</ins> | <ins>77.70</ins> | <ins>69.17</ins> | **90.85** | **74.96** | **85.71** | **76.38** | **96.86** | **91.28** |
272
+ | Qwen2.5-VL-7B | 94.14 | 84.96 | 75.10 | 62.90 | 59.54 | 46.28 | 78.41 | 54.61 | 74.18 | 55.16 |
273
+ | UI-TARS-7B | **95.24** | **91.79** | **81.63** | **74.43** | 86.06 | 67.90 | <ins>80.42</ins> | <ins>65.77</ins> | <ins>88.62</ins> | <ins>70.26</ins> |
274
  | OS-Genesis-7B | 90.74 | 74.22 | 65.92 | 44.43 | 11.67 | 3.63 | 19.98 | 8.45 | 38.10 | 14.50 |
275
+ | OS-Atlas-7B | 73.03 | 67.25 | 70.36 | 56.53 | 91.83* | 76.76* | 74.13 | 58.45 | 81.53 | 55.89 |
276
  | Aguvis-7B | 93.85 | 89.40 | 65.56 | 54.18 | 26.71 | 13.54 | 35.71 | 18.99 | 67.43 | 38.20 |
277
+ | OdysseyAgent-7B | 65.10 | 39.16 | 58.80 | 32.74 | <ins>90.83</ins> | <ins>73.67</ins> | 59.17 | 31.60 | 67.56 | 25.44 |
278
  | GPT-4o | - | 19.49 | - | 20.80 | - | 20.39 | 70.00 | 35.30 | 3.67 | 3.67 |
279
  | Gemini 2.0 | - | 28.50 | - | 60.20 | - | 3.27 | - | - | - | - |
280
  | Claude | - | 19.40 | - | 12.50 | 60.90 | - | - | - | - | - |