File size: 5,902 Bytes
02755a7
 
 
 
 
 
 
 
 
 
 
 
 
 
48fa894
 
 
 
 
 
 
 
02755a7
 
 
 
 
 
 
48fa894
02755a7
 
 
 
 
 
 
48fa894
02755a7
 
 
 
 
 
 
48fa894
02755a7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48fa894
 
 
 
 
6067773
48fa894
 
 
 
6067773
02755a7
 
48fa894
 
 
 
 
 
 
 
02755a7
 
48fa894
 
 
 
 
 
 
 
 
 
02755a7
 
48fa894
 
 
 
 
 
02755a7
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
license: mit
tags:
- RLinf
language:
- en
metrics:
- accuracy
base_model:
- gen-robot/openvla-7b-rlvla-warmup
pipeline_tag: reinforcement-learning
model-index:
- name: RLinf-openvla-maniskill3-grpo
  results:
  - task:
      type: VLA             
    dataset:
      type: maniskill-train
      name: maniskill-train
    metrics:
      - type: accuracy        
        value: 84.38
  - task:
      type: VLA             
    dataset:
      type: maniskill-vision
      name: maniskill-vision
    metrics:
      - type: accuracy        
        value: 74.69
  - task:
      type: VLA             
    dataset:
      type: maniskill-semantic
      name: maniskill-semantic
    metrics:
      - type: accuracy        
        value: 72.99
  - task:
      type: VLA             
    dataset:
      type: maniskill-position
      name: maniskill-position
    metrics:
      - type: accuracy        
        value: 77.86
---

<div align="center">
  <img src="logo.svg" alt="RLinf-logo" width="500"/>
</div>


<div align="center">
<!-- <a href="TODO"><img src="https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv"></a> -->
<!-- <a href="TODO"><img src="https://img.shields.io/badge/HuggingFace-yellow?logo=huggingface&logoColor=white" alt="Hugging Face"></a> -->
<a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
<a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
<!-- <a href="TODO"><img src="https://devin.ai/assets/deepwiki-badge.png" alt="Ask DeepWiki.com" style="height:20px;"></a>
<a href="TODO"><img src="https://img.shields.io/badge/微信-green?logo=wechat&amp"></a> -->
</div>

<h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>

[RLinf](https://github.com/RLinf/RLinf) is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.


<div align="center">
  <img src="overview.png" alt="RLinf-overview" width="600"/>
</div>

## Model Description
This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator.

## Full OOD Evaluation and Results
### Overall Eval Results
Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study.
| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| Avg results	| 0.7915	| 0.6064	  | 0.7705	   | **0.8193** | 0.7515     |

### Training Setting Eval
| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| Avg results	| 0.9375	| 0.9414	  | **0.9766**	   | 0.9609 | 0.8438     |

### OOD Eval on Vision

| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| vision avg	| 0.8047	| 0.8469	      | **0.9211**	   | 0.8203	 | 0.7469      |
| unseen table	| 0.9063	    | 0.9141	      | **0.9648**	   | 0.9570	 | 0.8984     |
| dynamic texture (weak) | 0.8516	| 0.9102	| **0.9492**	| 0.8555	| 0.7891 |	
| dynamic texture (strong)	| 0.7500	| 0.7734	| **0.8633**	| 0.7227	| 0.6563 |					
| dynamic noise (weak)	| 0.8281	| 0.8945	| **0.9805**	| 0.8711	| 0.7969| 
| dynamic noise (strong)	| 0.6875	| 0.7422	| **0.8477**	| 0.6953	| 0.5938 |

### OOD Eval on Semantic
| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| object avg	| 0.7500	| 0.4553	| 0.6484	| **0.7835**	| 0.7299 | 		
| unseen objects	| 0.8281	| 0.8047	| **0.8594**	| 0.8164	| 0.7656 | 		
| unseen receptacles	| 0.6875	| 0.7422	| **0.8750**	| 0.8125	| 0.7344 | 			
| unseen instructions	| 0.8203	| 0.6797	| 0.7109	| **0.9453**	| 0.8906 | 
| multi-object (both seen)	| 0.7891	| 0.3516	| 0.6055	| **0.8438**	| 0.7578 | 
| multi-object (both unseen)	| 0.5703	| 0.3047	| 0.5508	| **0.6289**	| 0.5781 | 
| distractive receptacle	| 0.8047	| 0.1875	| 0.6133	| **0.8281**	| 0.7813 | 
| multi-receptacle (both unseen)	| **0.7500**	| 0.3242	| 0.23828125	| 0.6094	| 0.6016 |

### OOD Eval on Position
| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| position avg	| 0.8177	| 0.4466	| 0.7357	| **0.8542**	| 0.7786 | 					
| unseen position (object & receptacle)	| 0.7344	| 0.4023	| 0.6992	| **0.8633**	| 0.7500 | 
| unseen robot init pose | **0.8359** | 0.4805 | 0.7188 | 0.7773 | 0.7031 |
| mid-episode object reposition	| 0.8828	| 0.4570	| 0.7891	| **0.9212**	| 0.8828 | 

## How to Use
Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvla.yaml``:

- Set ``actor.checkpoint_load_path``,  ``actor.tokenizer.tokenizer_model``, and ``rollout.model_dir`` to the path of the model checkpoint.

Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``.

## License
This code repository and the model weights are licensed under the MIT License.