xzxuan HillFir commited on
Commit
48fa894
·
verified ·
1 Parent(s): 6067773

Update README.md (#2)

Browse files

- Update README.md (0a076521c397249e853a510e5cee8a2ab000a36c)


Co-authored-by: Hongzhi Zang <HillFir@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +44 -35
README.md CHANGED
@@ -12,6 +12,14 @@ pipeline_tag: reinforcement-learning
12
  model-index:
13
  - name: RLinf-openvla-maniskill3-grpo
14
  results:
 
 
 
 
 
 
 
 
15
  - task:
16
  type: VLA
17
  dataset:
@@ -19,7 +27,7 @@ model-index:
19
  name: maniskill-vision
20
  metrics:
21
  - type: accuracy
22
- value: 74.7
23
  - task:
24
  type: VLA
25
  dataset:
@@ -27,7 +35,7 @@ model-index:
27
  name: maniskill-semantic
28
  metrics:
29
  - type: accuracy
30
- value: 74.4
31
  - task:
32
  type: VLA
33
  dataset:
@@ -35,7 +43,7 @@ model-index:
35
  name: maniskill-position
36
  metrics:
37
  - type: accuracy
38
- value: 81.6
39
  ---
40
 
41
  <div align="center">
@@ -65,46 +73,47 @@ model-index:
65
  This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator.
66
 
67
  ## Full OOD Evaluation and Results
68
- ### Overall OOD Eval Results
69
- Note: rl4vla refers to the paper [VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study](https://arxiv.org/abs/2505.19789).
 
 
 
70
 
71
- | Description | rl4vla | GRPO-openvlaoft | PPO-openvlaoft | PPO-openvla | __GRPO-openvla__ |
72
- |-------------|--------|-----------------|----------------|-------------|------------------|
73
- | Avg results | 76.08 | 61.48 | 64.53 | **82.21** | 75.47 |
 
74
 
75
  ### OOD Eval on Vision
76
 
77
- | Description | rl4vla | GRPO-openvlaoft | PPO-openvlaoft | PPO-openvla | __GRPO-openvla__ |
78
- |-------------|--------|-----------------|----------------|-------------|------------------|
79
- | vision avg | 76.56 | 84.69 | 80.55 | **82.03** | 74.69 |
80
- | unseen table | 84.40 | 91.41 | 94.53 | **95.70** | 89.84 |
81
- | dynamic texture (weak) | 83.30 | **91.02** | 82.42 | 85.55 | 78.91 |
82
- | dynamic texture (strong) | 63.00 | **77.34** | 62.50 | 72.27 | 65.62 |
83
- | dynamic noise (weak) | 85.40 | 89.45 | **89.84** | 87.11 | 79.69 |
84
- | dynamic noise (strong) | 66.70 | **74.22** | 73.44 | 69.53 | 59.38 |
85
 
86
  ### OOD Eval on Semantic
87
-
88
- | Description | rl4vla | GRPO-openvlaoft | PPO-openvlaoft | PPO-openvla | __GRPO-openvla__ |
89
- |-------------|--------|-----------------|----------------|-------------|------------------|
90
- | object avg | 75.40 | 51.61 | 56.64 | **80.57** | 74.41 |
91
- | train setting | 93.80 | 94.14 | 91.80 | **96.09** | 84.38 |
92
- | unseen objects | 71.40 | 80.47 | 77.73 | **81.64** | 76.56 |
93
- | unseen receptacles | 75.00 | 74.22 | 78.12 | **81.25** | 73.44 |
94
- | unseen instructions | 89.10 | 67.97 | 68.36 | **94.53** | 89.06 |
95
- | multi-object (both seen) | 75.00 | 35.16 | 42.97 | **84.38** | 75.78 |
96
- | multi-object (both unseen) | 57.80 | 30.47 | 38.67 | **62.89** | 57.81 |
97
- | distractive receptacle | 81.20 | 18.75 | 31.64 | **82.81** | 78.12 |
98
- | multi-receptacle (both unseen) | 59.90 | 11.72 | 23.83 | **60.94** | 60.16 |
99
 
100
  ### OOD Eval on Position
101
-
102
- | Description | rl4vla | GRPO-openvlaoft | PPO-openvlaoft | PPO-openvla | __GRPO-openvla__ |
103
- |-------------|--------|-----------------|----------------|-------------|------------------|
104
- | position avg | 77.60 | 42.97 | 56.05 | **89.26** | 81.64 |
105
- | unseen position (object & receptacle) | 80.70 | 40.23 | 50.39 | **86.33** | 75.00 |
106
- | mid-episode object reposition | 74.50 | 45.70 | 61.72 | **92.19** | 88.28 |
107
-
108
 
109
  ## How to Use
110
  Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvla.yaml``:
 
12
  model-index:
13
  - name: RLinf-openvla-maniskill3-grpo
14
  results:
15
+ - task:
16
+ type: VLA
17
+ dataset:
18
+ type: maniskill-train
19
+ name: maniskill-train
20
+ metrics:
21
+ - type: accuracy
22
+ value: 84.38
23
  - task:
24
  type: VLA
25
  dataset:
 
27
  name: maniskill-vision
28
  metrics:
29
  - type: accuracy
30
+ value: 74.69
31
  - task:
32
  type: VLA
33
  dataset:
 
35
  name: maniskill-semantic
36
  metrics:
37
  - type: accuracy
38
+ value: 72.99
39
  - task:
40
  type: VLA
41
  dataset:
 
43
  name: maniskill-position
44
  metrics:
45
  - type: accuracy
46
+ value: 77.86
47
  ---
48
 
49
  <div align="center">
 
73
  This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator.
74
 
75
  ## Full OOD Evaluation and Results
76
+ ### Overall Eval Results
77
+ Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study.
78
+ | Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
79
+ |---------------|-----------|-----------------|----------------|-------------|---------------|
80
+ | Avg results | 0.7915 | 0.6064 | 0.7705 | **0.8193** | 0.7515 |
81
 
82
+ ### Training Setting Eval
83
+ | Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
84
+ |---------------|-----------|-----------------|----------------|-------------|---------------|
85
+ | Avg results | 0.9375 | 0.9414 | **0.9766** | 0.9609 | 0.8438 |
86
 
87
  ### OOD Eval on Vision
88
 
89
+ | Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
90
+ |---------------|-----------|-----------------|----------------|-------------|---------------|
91
+ | vision avg | 0.8047 | 0.8469 | **0.9211** | 0.8203 | 0.7469 |
92
+ | unseen table | 0.9063 | 0.9141 | **0.9648** | 0.9570 | 0.8984 |
93
+ | dynamic texture (weak) | 0.8516 | 0.9102 | **0.9492** | 0.8555 | 0.7891 |
94
+ | dynamic texture (strong) | 0.7500 | 0.7734 | **0.8633** | 0.7227 | 0.6563 |
95
+ | dynamic noise (weak) | 0.8281 | 0.8945 | **0.9805** | 0.8711 | 0.7969|
96
+ | dynamic noise (strong) | 0.6875 | 0.7422 | **0.8477** | 0.6953 | 0.5938 |
97
 
98
  ### OOD Eval on Semantic
99
+ | Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
100
+ |---------------|-----------|-----------------|----------------|-------------|---------------|
101
+ | object avg | 0.7500 | 0.4553 | 0.6484 | **0.7835** | 0.7299 |
102
+ | unseen objects | 0.8281 | 0.8047 | **0.8594** | 0.8164 | 0.7656 |
103
+ | unseen receptacles | 0.6875 | 0.7422 | **0.8750** | 0.8125 | 0.7344 |
104
+ | unseen instructions | 0.8203 | 0.6797 | 0.7109 | **0.9453** | 0.8906 |
105
+ | multi-object (both seen) | 0.7891 | 0.3516 | 0.6055 | **0.8438** | 0.7578 |
106
+ | multi-object (both unseen) | 0.5703 | 0.3047 | 0.5508 | **0.6289** | 0.5781 |
107
+ | distractive receptacle | 0.8047 | 0.1875 | 0.6133 | **0.8281** | 0.7813 |
108
+ | multi-receptacle (both unseen) | **0.7500** | 0.3242 | 0.23828125 | 0.6094 | 0.6016 |
 
 
109
 
110
  ### OOD Eval on Position
111
+ | Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
112
+ |---------------|-----------|-----------------|----------------|-------------|---------------|
113
+ | position avg | 0.8177 | 0.4466 | 0.7357 | **0.8542** | 0.7786 |
114
+ | unseen position (object & receptacle) | 0.7344 | 0.4023 | 0.6992 | **0.8633** | 0.7500 |
115
+ | unseen robot init pose | **0.8359** | 0.4805 | 0.7188 | 0.7773 | 0.7031 |
116
+ | mid-episode object reposition | 0.8828 | 0.4570 | 0.7891 | **0.9212** | 0.8828 |
 
117
 
118
  ## How to Use
119
  Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvla.yaml``: