Update README.md
#2
by
HillFir
- opened
README.md
CHANGED
|
@@ -12,6 +12,14 @@ pipeline_tag: reinforcement-learning
|
|
| 12 |
model-index:
|
| 13 |
- name: RLinf-openvlaoft-maniskill3-grpo
|
| 14 |
results:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
- task:
|
| 16 |
type: VLA
|
| 17 |
dataset:
|
|
@@ -19,7 +27,7 @@ model-index:
|
|
| 19 |
name: maniskill-vision
|
| 20 |
metrics:
|
| 21 |
- type: accuracy
|
| 22 |
-
value: 84.
|
| 23 |
- task:
|
| 24 |
type: VLA
|
| 25 |
dataset:
|
|
@@ -27,7 +35,7 @@ model-index:
|
|
| 27 |
name: maniskill-semantic
|
| 28 |
metrics:
|
| 29 |
- type: accuracy
|
| 30 |
-
value:
|
| 31 |
- task:
|
| 32 |
type: VLA
|
| 33 |
dataset:
|
|
@@ -35,7 +43,7 @@ model-index:
|
|
| 35 |
name: maniskill-position
|
| 36 |
metrics:
|
| 37 |
- type: accuracy
|
| 38 |
-
value:
|
| 39 |
---
|
| 40 |
|
| 41 |
<div align="center">
|
|
@@ -62,49 +70,50 @@ model-index:
|
|
| 62 |
</div>
|
| 63 |
|
| 64 |
## Model Description
|
| 65 |
-
This openvla-oft model is trained on ``Haozhan72/Openvla-oft-SFT-libero10-trajall`` with an additional lora SFT checkpoint and finetuned by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator.
|
| 66 |
|
| 67 |
## Full OOD Evaluation and Results
|
| 68 |
-
### Overall
|
| 69 |
-
Note: rl4vla refers to the paper
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
|
|
|
| 74 |
|
| 75 |
### OOD Eval on Vision
|
| 76 |
|
| 77 |
-
| Description
|
| 78 |
-
|
| 79 |
-
| vision avg
|
| 80 |
-
| unseen table
|
| 81 |
-
| dynamic texture (weak) |
|
| 82 |
-
| dynamic texture (strong) |
|
| 83 |
-
| dynamic noise (weak)
|
| 84 |
-
| dynamic noise (strong) |
|
| 85 |
|
| 86 |
### OOD Eval on Semantic
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
|
| 91 |
-
|
|
| 92 |
-
| unseen
|
| 93 |
-
|
|
| 94 |
-
|
|
| 95 |
-
|
|
| 96 |
-
| multi-
|
| 97 |
-
| distractive receptacle | 81.20 | 18.75 | 31.64 | **82.81** | 78.12 |
|
| 98 |
-
| multi-receptacle (both unseen) | 59.90 | 11.72 | 23.83 | **60.94** | 60.16 |
|
| 99 |
|
| 100 |
### OOD Eval on Position
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
| position
|
| 105 |
-
| unseen
|
| 106 |
-
| mid-episode object reposition
|
| 107 |
-
|
| 108 |
|
| 109 |
## How to Use
|
| 110 |
Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvlaoft.yaml``:
|
|
|
|
| 12 |
model-index:
|
| 13 |
- name: RLinf-openvlaoft-maniskill3-grpo
|
| 14 |
results:
|
| 15 |
+
- task:
|
| 16 |
+
type: VLA
|
| 17 |
+
dataset:
|
| 18 |
+
type: maniskill-train
|
| 19 |
+
name: maniskill-train
|
| 20 |
+
metrics:
|
| 21 |
+
- type: accuracy
|
| 22 |
+
value: 94.14
|
| 23 |
- task:
|
| 24 |
type: VLA
|
| 25 |
dataset:
|
|
|
|
| 27 |
name: maniskill-vision
|
| 28 |
metrics:
|
| 29 |
- type: accuracy
|
| 30 |
+
value: 84.69
|
| 31 |
- task:
|
| 32 |
type: VLA
|
| 33 |
dataset:
|
|
|
|
| 35 |
name: maniskill-semantic
|
| 36 |
metrics:
|
| 37 |
- type: accuracy
|
| 38 |
+
value: 45.53
|
| 39 |
- task:
|
| 40 |
type: VLA
|
| 41 |
dataset:
|
|
|
|
| 43 |
name: maniskill-position
|
| 44 |
metrics:
|
| 45 |
- type: accuracy
|
| 46 |
+
value: 44.66
|
| 47 |
---
|
| 48 |
|
| 49 |
<div align="center">
|
|
|
|
| 70 |
</div>
|
| 71 |
|
| 72 |
## Model Description
|
| 73 |
+
This openvla-oft model is trained on ``Haozhan72/Openvla-oft-SFT-libero10-trajall`` with an additional lora SFT checkpoint ``RLinf/RLinf-OpenVLAOFT-ManiSkill-Base-Lora`` and finetuned by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator.
|
| 74 |
|
| 75 |
## Full OOD Evaluation and Results
|
| 76 |
+
### Overall Eval Results
|
| 77 |
+
Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study.
|
| 78 |
+
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|
| 79 |
+
|---------------|-----------|-----------------|----------------|-------------|---------------|
|
| 80 |
+
| Avg results | 0.7915 | 0.6064 | 0.7705 | **0.8193** | 0.7515 |
|
| 81 |
|
| 82 |
+
### Training Setting Eval
|
| 83 |
+
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|
| 84 |
+
|---------------|-----------|-----------------|----------------|-------------|---------------|
|
| 85 |
+
| Avg results | 0.9375 | 0.9414 | **0.9766** | 0.9609 | 0.8438 |
|
| 86 |
|
| 87 |
### OOD Eval on Vision
|
| 88 |
|
| 89 |
+
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|
| 90 |
+
|---------------|-----------|-----------------|----------------|-------------|---------------|
|
| 91 |
+
| vision avg | 0.8047 | 0.8469 | **0.9211** | 0.8203 | 0.7469 |
|
| 92 |
+
| unseen table | 0.9063 | 0.9141 | **0.9648** | 0.9570 | 0.8984 |
|
| 93 |
+
| dynamic texture (weak) | 0.8516 | 0.9102 | **0.9492** | 0.8555 | 0.7891 |
|
| 94 |
+
| dynamic texture (strong) | 0.7500 | 0.7734 | **0.8633** | 0.7227 | 0.6563 |
|
| 95 |
+
| dynamic noise (weak) | 0.8281 | 0.8945 | **0.9805** | 0.8711 | 0.7969|
|
| 96 |
+
| dynamic noise (strong) | 0.6875 | 0.7422 | **0.8477** | 0.6953 | 0.5938 |
|
| 97 |
|
| 98 |
### OOD Eval on Semantic
|
| 99 |
+
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|
| 100 |
+
|---------------|-----------|-----------------|----------------|-------------|---------------|
|
| 101 |
+
| object avg | 0.7500 | 0.4553 | 0.6484 | **0.7835** | 0.7299 |
|
| 102 |
+
| unseen objects | 0.8281 | 0.8047 | **0.8594** | 0.8164 | 0.7656 |
|
| 103 |
+
| unseen receptacles | 0.6875 | 0.7422 | **0.8750** | 0.8125 | 0.7344 |
|
| 104 |
+
| unseen instructions | 0.8203 | 0.6797 | 0.7109 | **0.9453** | 0.8906 |
|
| 105 |
+
| multi-object (both seen) | 0.7891 | 0.3516 | 0.6055 | **0.8438** | 0.7578 |
|
| 106 |
+
| multi-object (both unseen) | 0.5703 | 0.3047 | 0.5508 | **0.6289** | 0.5781 |
|
| 107 |
+
| distractive receptacle | 0.8047 | 0.1875 | 0.6133 | **0.8281** | 0.7813 |
|
| 108 |
+
| multi-receptacle (both unseen) | **0.7500** | 0.3242 | 0.23828125 | 0.6094 | 0.6016 |
|
|
|
|
|
|
|
| 109 |
|
| 110 |
### OOD Eval on Position
|
| 111 |
+
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|
| 112 |
+
|---------------|-----------|-----------------|----------------|-------------|---------------|
|
| 113 |
+
| position avg | 0.8177 | 0.4466 | 0.7357 | **0.8542** | 0.7786 |
|
| 114 |
+
| unseen position (object & receptacle) | 0.7344 | 0.4023 | 0.6992 | **0.8633** | 0.7500 |
|
| 115 |
+
| unseen robot init pose | **0.8359** | 0.4805 | 0.7188 | 0.7773 | 0.7031 |
|
| 116 |
+
| mid-episode object reposition | 0.8828 | 0.4570 | 0.7891 | **0.9212** | 0.8828 |
|
|
|
|
| 117 |
|
| 118 |
## How to Use
|
| 119 |
Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvlaoft.yaml``:
|