RLinf
/

RLinf-OpenVLAOFT-GRPO-ManiSkill3-25ood

@@ -12,6 +12,14 @@ pipeline_tag: reinforcement-learning
 model-index:
 - name: RLinf-openvlaoft-maniskill3-grpo
   results:
   - task:
       type: VLA
     dataset:
@@ -19,7 +27,7 @@ model-index:
       name: maniskill-vision
     metrics:
       - type: accuracy
-        value: 84.6
   - task:
       type: VLA
     dataset:
@@ -27,7 +35,7 @@ model-index:
       name: maniskill-semantic
     metrics:
       - type: accuracy
-        value: 51.6
   - task:
       type: VLA
     dataset:
@@ -35,7 +43,7 @@ model-index:
       name: maniskill-position
     metrics:
       - type: accuracy
-        value: 42.9
 ---
 <div align="center">
@@ -62,49 +70,50 @@ model-index:
 </div>
 ## Model Description
-This openvla-oft model is trained on ``Haozhan72/Openvla-oft-SFT-libero10-trajall`` with an additional lora SFT checkpoint and finetuned by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator.
 ## Full OOD Evaluation and Results
-### Overall OOD Eval Results
-Note: rl4vla refers to the paper [VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study](https://arxiv.org/abs/2505.19789).
-| Description | rl4vla | __GRPO-openvlaoft__ | PPO-openvlaoft | PPO-openvla | GRPO-openvla |
-|-------------|--------|---------------------|----------------|-------------|--------------|
-| Avg results | 76.08 | 61.48 | 64.53 | **82.21** | 75.47 |
 ### OOD Eval on Vision
-| Description | rl4vla | __GRPO-openvlaoft__ | PPO-openvlaoft | PPO-openvla | GRPO-openvla |
-|-------------|--------|---------------------|----------------|-------------|--------------|
-| vision avg | 76.56 | 84.69 | 80.55 | **82.03** | 74.69 |
-| unseen table | 84.40 | 91.41 | 94.53 | **95.70** | 89.84 |
-| dynamic texture (weak) | 83.30 | **91.02** | 82.42 | 85.55 | 78.91 |
-| dynamic texture (strong) | 63.00 | **77.34** | 62.50 | 72.27 | 65.62 |
-| dynamic noise (weak) | 85.40 | 89.45 | **89.84** | 87.11 | 79.69 |
-| dynamic noise (strong) | 66.70 | **74.22** | 73.44 | 69.53 | 59.38 |
 ### OOD Eval on Semantic
-| Description | rl4vla | __GRPO-openvlaoft__ | PPO-openvlaoft | PPO-openvla | GRPO-openvla |
-|-------------|--------|---------------------|----------------|-------------|--------------|
-| object avg | 75.40 | 51.61 | 56.64 | **80.57** | 74.41 |
-| train setting | 93.80 | 94.14 | 91.80 | **96.09** | 84.38 |
-| unseen objects | 71.40 | 80.47 | 77.73 | **81.64** | 76.56 |
-| unseen receptacles | 75.00 | 74.22 | 78.12 | **81.25** | 73.44 |
-| unseen instructions | 89.10 | 67.97 | 68.36 | **94.53** | 89.06 |
-| multi-object (both seen) | 75.00 | 35.16 | 42.97 | **84.38** | 75.78 |
-| multi-object (both unseen) | 57.80 | 30.47 | 38.67 | **62.89** | 57.81 |
-| distractive receptacle | 81.20 | 18.75 | 31.64 | **82.81** | 78.12 |
-| multi-receptacle (both unseen) | 59.90 | 11.72 | 23.83 | **60.94** | 60.16 |
 ### OOD Eval on Position
-| Description | rl4vla | __GRPO-openvlaoft__ | PPO-openvlaoft | PPO-openvla | GRPO-openvla |
-|-------------|--------|---------------------|----------------|-------------|--------------|
-| position avg | 77.60 | 42.97 | 56.05 | **89.26** | 81.64 |
-| unseen position (object & receptacle) | 80.70 | 40.23 | 50.39 | **86.33** | 75.00 |
-| mid-episode object reposition | 74.50 | 45.70 | 61.72 | **92.19** | 88.28 |
 ## How to Use
 Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvlaoft.yaml``:

 model-index:
 - name: RLinf-openvlaoft-maniskill3-grpo
   results:
+  - task:
+      type: VLA
+    dataset:
+      type: maniskill-train
+      name: maniskill-train
+    metrics:
+      - type: accuracy
+        value: 94.14
   - task:
       type: VLA
     dataset:
       name: maniskill-vision
     metrics:
       - type: accuracy
+        value: 84.69
   - task:
       type: VLA
     dataset:
       name: maniskill-semantic
     metrics:
       - type: accuracy
+        value: 45.53
   - task:
       type: VLA
     dataset:
       name: maniskill-position
     metrics:
       - type: accuracy
+        value: 44.66
 ---
 <div align="center">
 </div>
 ## Model Description
+This openvla-oft model is trained on ``Haozhan72/Openvla-oft-SFT-libero10-trajall`` with an additional lora SFT checkpoint ``RLinf/RLinf-OpenVLAOFT-ManiSkill-Base-Lora`` and finetuned by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator.
 ## Full OOD Evaluation and Results
+### Overall Eval Results
+Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study.
+| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
+|---------------|-----------|-----------------|----------------|-------------|---------------|
+| Avg results	| 0.7915	| 0.6064	  | 0.7705	   | **0.8193** | 0.7515     |
+### Training Setting Eval
+| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
+|---------------|-----------|-----------------|----------------|-------------|---------------|
+| Avg results	| 0.9375	| 0.9414	  | **0.9766**	   | 0.9609 | 0.8438     |
 ### OOD Eval on Vision
+| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
+|---------------|-----------|-----------------|----------------|-------------|---------------|
+| vision avg	| 0.8047	| 0.8469	      | **0.9211**	   | 0.8203	 | 0.7469      |
+| unseen table	| 0.9063	    | 0.9141	      | **0.9648**	   | 0.9570	 | 0.8984     |
+| dynamic texture (weak) | 0.8516	| 0.9102	| **0.9492**	| 0.8555	| 0.7891 |
+| dynamic texture (strong)	| 0.7500	| 0.7734	| **0.8633**	| 0.7227	| 0.6563 |
+| dynamic noise (weak)	| 0.8281	| 0.8945	| **0.9805**	| 0.8711	| 0.7969|
+| dynamic noise (strong)	| 0.6875	| 0.7422	| **0.8477**	| 0.6953	| 0.5938 |
 ### OOD Eval on Semantic
+| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
+|---------------|-----------|-----------------|----------------|-------------|---------------|
+| object avg	| 0.7500	| 0.4553	| 0.6484	| **0.7835**	| 0.7299 |
+| unseen objects	| 0.8281	| 0.8047	| **0.8594**	| 0.8164	| 0.7656 |
+| unseen receptacles	| 0.6875	| 0.7422	| **0.8750**	| 0.8125	| 0.7344 |
+| unseen instructions	| 0.8203	| 0.6797	| 0.7109	| **0.9453**	| 0.8906 |
+| multi-object (both seen)	| 0.7891	| 0.3516	| 0.6055	| **0.8438**	| 0.7578 |
+| multi-object (both unseen)	| 0.5703	| 0.3047	| 0.5508	| **0.6289**	| 0.5781 |
+| distractive receptacle	| 0.8047	| 0.1875	| 0.6133	| **0.8281**	| 0.7813 |
+| multi-receptacle (both unseen)	| **0.7500**	| 0.3242	| 0.23828125	| 0.6094	| 0.6016 |
 ### OOD Eval on Position
+| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
+|---------------|-----------|-----------------|----------------|-------------|---------------|
+| position avg	| 0.8177	| 0.4466	| 0.7357	| **0.8542**	| 0.7786 |
+| unseen position (object & receptacle)	| 0.7344	| 0.4023	| 0.6992	| **0.8633**	| 0.7500 |
+| unseen robot init pose | **0.8359** | 0.4805 | 0.7188 | 0.7773 | 0.7031 |
+| mid-episode object reposition	| 0.8828	| 0.4570	| 0.7891	| **0.9212**	| 0.8828 |
 ## How to Use
 Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvlaoft.yaml``: