--- license: mit tags: - RLinf language: - en metrics: - accuracy base_model: - gen-robot/openvla-7b-rlvla-warmup pipeline_tag: reinforcement-learning model-index: - name: RLinf-openvla-maniskill3-grpo results: - task: type: VLA dataset: type: maniskill-train name: maniskill-train metrics: - type: accuracy value: 84.38 - task: type: VLA dataset: type: maniskill-vision name: maniskill-vision metrics: - type: accuracy value: 74.69 - task: type: VLA dataset: type: maniskill-semantic name: maniskill-semantic metrics: - type: accuracy value: 72.99 - task: type: VLA dataset: type: maniskill-position name: maniskill-position metrics: - type: accuracy value: 77.86 ---

RLinf: Reinforcement Learning Infrastructure for Agentic AI

[RLinf](https://github.com/RLinf/RLinf) is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.

## Model Description This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator. ## Full OOD Evaluation and Results ### Overall Eval Results Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study. | Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla | |---------------|-----------|-----------------|----------------|-------------|---------------| | Avg results | 0.7915 | 0.6064 | 0.7705 | **0.8193** | 0.7515 | ### Training Setting Eval | Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla | |---------------|-----------|-----------------|----------------|-------------|---------------| | Avg results | 0.9375 | 0.9414 | **0.9766** | 0.9609 | 0.8438 | ### OOD Eval on Vision | Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla | |---------------|-----------|-----------------|----------------|-------------|---------------| | vision avg | 0.8047 | 0.8469 | **0.9211** | 0.8203 | 0.7469 | | unseen table | 0.9063 | 0.9141 | **0.9648** | 0.9570 | 0.8984 | | dynamic texture (weak) | 0.8516 | 0.9102 | **0.9492** | 0.8555 | 0.7891 | | dynamic texture (strong) | 0.7500 | 0.7734 | **0.8633** | 0.7227 | 0.6563 | | dynamic noise (weak) | 0.8281 | 0.8945 | **0.9805** | 0.8711 | 0.7969| | dynamic noise (strong) | 0.6875 | 0.7422 | **0.8477** | 0.6953 | 0.5938 | ### OOD Eval on Semantic | Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla | |---------------|-----------|-----------------|----------------|-------------|---------------| | object avg | 0.7500 | 0.4553 | 0.6484 | **0.7835** | 0.7299 | | unseen objects | 0.8281 | 0.8047 | **0.8594** | 0.8164 | 0.7656 | | unseen receptacles | 0.6875 | 0.7422 | **0.8750** | 0.8125 | 0.7344 | | unseen instructions | 0.8203 | 0.6797 | 0.7109 | **0.9453** | 0.8906 | | multi-object (both seen) | 0.7891 | 0.3516 | 0.6055 | **0.8438** | 0.7578 | | multi-object (both unseen) | 0.5703 | 0.3047 | 0.5508 | **0.6289** | 0.5781 | | distractive receptacle | 0.8047 | 0.1875 | 0.6133 | **0.8281** | 0.7813 | | multi-receptacle (both unseen) | **0.7500** | 0.3242 | 0.23828125 | 0.6094 | 0.6016 | ### OOD Eval on Position | Description | rl4vla | GRPO-openvlaoft | __PPO-openvlaoft__ | PPO-openvla | GRPO-openvla | |---------------|-----------|-----------------|----------------|-------------|---------------| | position avg | 0.8177 | 0.4466 | 0.7357 | **0.8542** | 0.7786 | | unseen position (object & receptacle) | 0.7344 | 0.4023 | 0.6992 | **0.8633** | 0.7500 | | unseen robot init pose | **0.8359** | 0.4805 | 0.7188 | 0.7773 | 0.7031 | | mid-episode object reposition | 0.8828 | 0.4570 | 0.7891 | **0.9212** | 0.8828 | ## How to Use Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvla.yaml``: - Set ``actor.checkpoint_load_path``, ``actor.tokenizer.tokenizer_model``, and ``rollout.model_dir`` to the path of the model checkpoint. Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``. ## License This code repository and the model weights are licensed under the MIT License.