Update README.md (#2)
Browse files- Update README.md (0a076521c397249e853a510e5cee8a2ab000a36c)
Co-authored-by: Hongzhi Zang <HillFir@users.noreply.huggingface.co>
README.md
CHANGED
@@ -12,6 +12,14 @@ pipeline_tag: reinforcement-learning
|
|
12 |
model-index:
|
13 |
- name: RLinf-openvla-maniskill3-grpo
|
14 |
results:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
- task:
|
16 |
type: VLA
|
17 |
dataset:
|
@@ -19,7 +27,7 @@ model-index:
|
|
19 |
name: maniskill-vision
|
20 |
metrics:
|
21 |
- type: accuracy
|
22 |
-
value: 74.
|
23 |
- task:
|
24 |
type: VLA
|
25 |
dataset:
|
@@ -27,7 +35,7 @@ model-index:
|
|
27 |
name: maniskill-semantic
|
28 |
metrics:
|
29 |
- type: accuracy
|
30 |
-
value:
|
31 |
- task:
|
32 |
type: VLA
|
33 |
dataset:
|
@@ -35,7 +43,7 @@ model-index:
|
|
35 |
name: maniskill-position
|
36 |
metrics:
|
37 |
- type: accuracy
|
38 |
-
value:
|
39 |
---
|
40 |
|
41 |
<div align="center">
|
@@ -65,46 +73,47 @@ model-index:
|
|
65 |
This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator.
|
66 |
|
67 |
## Full OOD Evaluation and Results
|
68 |
-
### Overall
|
69 |
-
Note: rl4vla refers to the paper
|
|
|
|
|
|
|
70 |
|
71 |
-
|
72 |
-
|
73 |
-
|
|
|
74 |
|
75 |
### OOD Eval on Vision
|
76 |
|
77 |
-
| Description
|
78 |
-
|
79 |
-
| vision avg
|
80 |
-
| unseen table
|
81 |
-
| dynamic texture (weak) |
|
82 |
-
| dynamic texture (strong) |
|
83 |
-
| dynamic noise (weak)
|
84 |
-
| dynamic noise (strong) |
|
85 |
|
86 |
### OOD Eval on Semantic
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
|
91 |
-
|
|
92 |
-
| unseen
|
93 |
-
|
|
94 |
-
|
|
95 |
-
|
|
96 |
-
| multi-
|
97 |
-
| distractive receptacle | 81.20 | 18.75 | 31.64 | **82.81** | 78.12 |
|
98 |
-
| multi-receptacle (both unseen) | 59.90 | 11.72 | 23.83 | **60.94** | 60.16 |
|
99 |
|
100 |
### OOD Eval on Position
|
101 |
-
|
102 |
-
|
103 |
-
|
104 |
-
| position
|
105 |
-
| unseen
|
106 |
-
| mid-episode object reposition
|
107 |
-
|
108 |
|
109 |
## How to Use
|
110 |
Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvla.yaml``:
|
|
|
12 |
model-index:
|
13 |
- name: RLinf-openvla-maniskill3-grpo
|
14 |
results:
|
15 |
+
- task:
|
16 |
+
type: VLA
|
17 |
+
dataset:
|
18 |
+
type: maniskill-train
|
19 |
+
name: maniskill-train
|
20 |
+
metrics:
|
21 |
+
- type: accuracy
|
22 |
+
value: 84.38
|
23 |
- task:
|
24 |
type: VLA
|
25 |
dataset:
|
|
|
27 |
name: maniskill-vision
|
28 |
metrics:
|
29 |
- type: accuracy
|
30 |
+
value: 74.69
|
31 |
- task:
|
32 |
type: VLA
|
33 |
dataset:
|
|
|
35 |
name: maniskill-semantic
|
36 |
metrics:
|
37 |
- type: accuracy
|
38 |
+
value: 72.99
|
39 |
- task:
|
40 |
type: VLA
|
41 |
dataset:
|
|
|
43 |
name: maniskill-position
|
44 |
metrics:
|
45 |
- type: accuracy
|
46 |
+
value: 77.86
|
47 |
---
|
48 |
|
49 |
<div align="center">
|
|
|
73 |
This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator.
|
74 |
|
75 |
## Full OOD Evaluation and Results
|
76 |
+
### Overall Eval Results
|
77 |
+
Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study.
|
78 |
+
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|
79 |
+
|---------------|-----------|-----------------|----------------|-------------|---------------|
|
80 |
+
| Avg results | 0.7915 | 0.6064 | 0.7705 | **0.8193** | 0.7515 |
|
81 |
|
82 |
+
### Training Setting Eval
|
83 |
+
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|
84 |
+
|---------------|-----------|-----------------|----------------|-------------|---------------|
|
85 |
+
| Avg results | 0.9375 | 0.9414 | **0.9766** | 0.9609 | 0.8438 |
|
86 |
|
87 |
### OOD Eval on Vision
|
88 |
|
89 |
+
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|
90 |
+
|---------------|-----------|-----------------|----------------|-------------|---------------|
|
91 |
+
| vision avg | 0.8047 | 0.8469 | **0.9211** | 0.8203 | 0.7469 |
|
92 |
+
| unseen table | 0.9063 | 0.9141 | **0.9648** | 0.9570 | 0.8984 |
|
93 |
+
| dynamic texture (weak) | 0.8516 | 0.9102 | **0.9492** | 0.8555 | 0.7891 |
|
94 |
+
| dynamic texture (strong) | 0.7500 | 0.7734 | **0.8633** | 0.7227 | 0.6563 |
|
95 |
+
| dynamic noise (weak) | 0.8281 | 0.8945 | **0.9805** | 0.8711 | 0.7969|
|
96 |
+
| dynamic noise (strong) | 0.6875 | 0.7422 | **0.8477** | 0.6953 | 0.5938 |
|
97 |
|
98 |
### OOD Eval on Semantic
|
99 |
+
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|
100 |
+
|---------------|-----------|-----------------|----------------|-------------|---------------|
|
101 |
+
| object avg | 0.7500 | 0.4553 | 0.6484 | **0.7835** | 0.7299 |
|
102 |
+
| unseen objects | 0.8281 | 0.8047 | **0.8594** | 0.8164 | 0.7656 |
|
103 |
+
| unseen receptacles | 0.6875 | 0.7422 | **0.8750** | 0.8125 | 0.7344 |
|
104 |
+
| unseen instructions | 0.8203 | 0.6797 | 0.7109 | **0.9453** | 0.8906 |
|
105 |
+
| multi-object (both seen) | 0.7891 | 0.3516 | 0.6055 | **0.8438** | 0.7578 |
|
106 |
+
| multi-object (both unseen) | 0.5703 | 0.3047 | 0.5508 | **0.6289** | 0.5781 |
|
107 |
+
| distractive receptacle | 0.8047 | 0.1875 | 0.6133 | **0.8281** | 0.7813 |
|
108 |
+
| multi-receptacle (both unseen) | **0.7500** | 0.3242 | 0.23828125 | 0.6094 | 0.6016 |
|
|
|
|
|
109 |
|
110 |
### OOD Eval on Position
|
111 |
+
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|
112 |
+
|---------------|-----------|-----------------|----------------|-------------|---------------|
|
113 |
+
| position avg | 0.8177 | 0.4466 | 0.7357 | **0.8542** | 0.7786 |
|
114 |
+
| unseen position (object & receptacle) | 0.7344 | 0.4023 | 0.6992 | **0.8633** | 0.7500 |
|
115 |
+
| unseen robot init pose | **0.8359** | 0.4805 | 0.7188 | 0.7773 | 0.7031 |
|
116 |
+
| mid-episode object reposition | 0.8828 | 0.4570 | 0.7891 | **0.9212** | 0.8828 |
|
|
|
117 |
|
118 |
## How to Use
|
119 |
Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvla.yaml``:
|