File size: 5,902 Bytes
02755a7 48fa894 02755a7 48fa894 02755a7 48fa894 02755a7 48fa894 02755a7 48fa894 6067773 48fa894 6067773 02755a7 48fa894 02755a7 48fa894 02755a7 48fa894 02755a7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
---
license: mit
tags:
- RLinf
language:
- en
metrics:
- accuracy
base_model:
- gen-robot/openvla-7b-rlvla-warmup
pipeline_tag: reinforcement-learning
model-index:
- name: RLinf-openvla-maniskill3-grpo
results:
- task:
type: VLA
dataset:
type: maniskill-train
name: maniskill-train
metrics:
- type: accuracy
value: 84.38
- task:
type: VLA
dataset:
type: maniskill-vision
name: maniskill-vision
metrics:
- type: accuracy
value: 74.69
- task:
type: VLA
dataset:
type: maniskill-semantic
name: maniskill-semantic
metrics:
- type: accuracy
value: 72.99
- task:
type: VLA
dataset:
type: maniskill-position
name: maniskill-position
metrics:
- type: accuracy
value: 77.86
---
<div align="center">
<img src="logo.svg" alt="RLinf-logo" width="500"/>
</div>
<div align="center">
<!-- <a href="TODO"><img src="https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv"></a> -->
<!-- <a href="TODO"><img src="https://img.shields.io/badge/HuggingFace-yellow?logo=huggingface&logoColor=white" alt="Hugging Face"></a> -->
<a href="https://github.com/RLinf/RLinf"><img src="https://img.shields.io/badge/Github-blue"></a>
<a href="https://rlinf.readthedocs.io/en/latest/"><img src="https://img.shields.io/badge/Documentation-Purple?color=8A2BE2&logo=readthedocs"></a>
<!-- <a href="TODO"><img src="https://devin.ai/assets/deepwiki-badge.png" alt="Ask DeepWiki.com" style="height:20px;"></a>
<a href="TODO"><img src="https://img.shields.io/badge/微信-green?logo=wechat&"></a> -->
</div>
<h1 align="center">RLinf: Reinforcement Learning Infrastructure for Agentic AI</h1>
[RLinf](https://github.com/RLinf/RLinf) is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.
<div align="center">
<img src="overview.png" alt="RLinf-overview" width="600"/>
</div>
## Model Description
This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator.
## Full OOD Evaluation and Results
### Overall Eval Results
Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study.
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| Avg results | 0.7915 | 0.6064 | 0.7705 | **0.8193** | 0.7515 |
### Training Setting Eval
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| Avg results | 0.9375 | 0.9414 | **0.9766** | 0.9609 | 0.8438 |
### OOD Eval on Vision
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| vision avg | 0.8047 | 0.8469 | **0.9211** | 0.8203 | 0.7469 |
| unseen table | 0.9063 | 0.9141 | **0.9648** | 0.9570 | 0.8984 |
| dynamic texture (weak) | 0.8516 | 0.9102 | **0.9492** | 0.8555 | 0.7891 |
| dynamic texture (strong) | 0.7500 | 0.7734 | **0.8633** | 0.7227 | 0.6563 |
| dynamic noise (weak) | 0.8281 | 0.8945 | **0.9805** | 0.8711 | 0.7969|
| dynamic noise (strong) | 0.6875 | 0.7422 | **0.8477** | 0.6953 | 0.5938 |
### OOD Eval on Semantic
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| object avg | 0.7500 | 0.4553 | 0.6484 | **0.7835** | 0.7299 |
| unseen objects | 0.8281 | 0.8047 | **0.8594** | 0.8164 | 0.7656 |
| unseen receptacles | 0.6875 | 0.7422 | **0.8750** | 0.8125 | 0.7344 |
| unseen instructions | 0.8203 | 0.6797 | 0.7109 | **0.9453** | 0.8906 |
| multi-object (both seen) | 0.7891 | 0.3516 | 0.6055 | **0.8438** | 0.7578 |
| multi-object (both unseen) | 0.5703 | 0.3047 | 0.5508 | **0.6289** | 0.5781 |
| distractive receptacle | 0.8047 | 0.1875 | 0.6133 | **0.8281** | 0.7813 |
| multi-receptacle (both unseen) | **0.7500** | 0.3242 | 0.23828125 | 0.6094 | 0.6016 |
### OOD Eval on Position
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| position avg | 0.8177 | 0.4466 | 0.7357 | **0.8542** | 0.7786 |
| unseen position (object & receptacle) | 0.7344 | 0.4023 | 0.6992 | **0.8633** | 0.7500 |
| unseen robot init pose | **0.8359** | 0.4805 | 0.7188 | 0.7773 | 0.7031 |
| mid-episode object reposition | 0.8828 | 0.4570 | 0.7891 | **0.9212** | 0.8828 |
## How to Use
Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvla.yaml``:
- Set ``actor.checkpoint_load_path``, ``actor.tokenizer.tokenizer_model``, and ``rollout.model_dir`` to the path of the model checkpoint.
Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``.
## License
This code repository and the model weights are licensed under the MIT License.
|