---
license: mit
tags:
- RLinf
language:
- en
metrics:
- accuracy
base_model:
- gen-robot/openvla-7b-rlvla-warmup
pipeline_tag: reinforcement-learning
model-index:
- name: RLinf-openvla-maniskill3-grpo
results:
- task:
type: VLA
dataset:
type: maniskill-train
name: maniskill-train
metrics:
- type: accuracy
value: 84.38
- task:
type: VLA
dataset:
type: maniskill-vision
name: maniskill-vision
metrics:
- type: accuracy
value: 74.69
- task:
type: VLA
dataset:
type: maniskill-semantic
name: maniskill-semantic
metrics:
- type: accuracy
value: 72.99
- task:
type: VLA
dataset:
type: maniskill-position
name: maniskill-position
metrics:
- type: accuracy
value: 77.86
---
RLinf: Reinforcement Learning Infrastructure for Agentic AI
[RLinf](https://github.com/RLinf/RLinf) is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.
## Model Description
This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator.
## Full OOD Evaluation and Results
### Overall Eval Results
Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study.
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| Avg results | 0.7915 | 0.6064 | 0.7705 | **0.8193** | 0.7515 |
### Training Setting Eval
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| Avg results | 0.9375 | 0.9414 | **0.9766** | 0.9609 | 0.8438 |
### OOD Eval on Vision
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| vision avg | 0.8047 | 0.8469 | **0.9211** | 0.8203 | 0.7469 |
| unseen table | 0.9063 | 0.9141 | **0.9648** | 0.9570 | 0.8984 |
| dynamic texture (weak) | 0.8516 | 0.9102 | **0.9492** | 0.8555 | 0.7891 |
| dynamic texture (strong) | 0.7500 | 0.7734 | **0.8633** | 0.7227 | 0.6563 |
| dynamic noise (weak) | 0.8281 | 0.8945 | **0.9805** | 0.8711 | 0.7969|
| dynamic noise (strong) | 0.6875 | 0.7422 | **0.8477** | 0.6953 | 0.5938 |
### OOD Eval on Semantic
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| object avg | 0.7500 | 0.4553 | 0.6484 | **0.7835** | 0.7299 |
| unseen objects | 0.8281 | 0.8047 | **0.8594** | 0.8164 | 0.7656 |
| unseen receptacles | 0.6875 | 0.7422 | **0.8750** | 0.8125 | 0.7344 |
| unseen instructions | 0.8203 | 0.6797 | 0.7109 | **0.9453** | 0.8906 |
| multi-object (both seen) | 0.7891 | 0.3516 | 0.6055 | **0.8438** | 0.7578 |
| multi-object (both unseen) | 0.5703 | 0.3047 | 0.5508 | **0.6289** | 0.5781 |
| distractive receptacle | 0.8047 | 0.1875 | 0.6133 | **0.8281** | 0.7813 |
| multi-receptacle (both unseen) | **0.7500** | 0.3242 | 0.23828125 | 0.6094 | 0.6016 |
### OOD Eval on Position
| Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| position avg | 0.8177 | 0.4466 | 0.7357 | **0.8542** | 0.7786 |
| unseen position (object & receptacle) | 0.7344 | 0.4023 | 0.6992 | **0.8633** | 0.7500 |
| unseen robot init pose | **0.8359** | 0.4805 | 0.7188 | 0.7773 | 0.7031 |
| mid-episode object reposition | 0.8828 | 0.4570 | 0.7891 | **0.9212** | 0.8828 |
## How to Use
Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvla.yaml``:
- Set ``actor.checkpoint_load_path``, ``actor.tokenizer.tokenizer_model``, and ``rollout.model_dir`` to the path of the model checkpoint.
Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``.
## License
This code repository and the model weights are licensed under the MIT License.