--- license: mit tags: - RLinf language: - en metrics: - accuracy base_model: - gen-robot/openvla-7b-rlvla-warmup pipeline_tag: reinforcement-learning model-index: - name: RLinf-openvla-maniskill3-grpo results: - task: type: VLA dataset: type: maniskill-vision name: maniskill-vision metrics: - type: accuracy value: 74.7 - task: type: VLA dataset: type: maniskill-semantic name: maniskill-semantic metrics: - type: accuracy value: 74.4 - task: type: VLA dataset: type: maniskill-position name: maniskill-position metrics: - type: accuracy value: 81.6 ---
RLinf-logo

RLinf: Reinforcement Learning Infrastructure for Agentic AI

[RLinf](https://github.com/RLinf/RLinf) is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.
RLinf-overview
## Model Description This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator. ## Full OOD Evaluation and Results ### Overall OOD Eval Results Note: rl4vla refers to the paper [VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study](https://arxiv.org/abs/2505.19789). | Description | rl4vla | GRPO-openvlaoft | PPO-openvlaoft | PPO-openvla | __GRPO-openvla__ | |-------------|--------|-----------------|----------------|-------------|------------------| | Avg results | 76.08 | 61.48 | 64.53 | **82.21** | 75.47 | ### OOD Eval on Vision | Description | rl4vla | GRPO-openvlaoft | PPO-openvlaoft | PPO-openvla | __GRPO-openvla__ | |-------------|--------|-----------------|----------------|-------------|------------------| | vision avg | 76.56 | 84.69 | 80.55 | **82.03** | 74.69 | | unseen table | 84.40 | 91.41 | 94.53 | **95.70** | 89.84 | | dynamic texture (weak) | 83.30 | **91.02** | 82.42 | 85.55 | 78.91 | | dynamic texture (strong) | 63.00 | **77.34** | 62.50 | 72.27 | 65.62 | | dynamic noise (weak) | 85.40 | 89.45 | **89.84** | 87.11 | 79.69 | | dynamic noise (strong) | 66.70 | **74.22** | 73.44 | 69.53 | 59.38 | ### OOD Eval on Semantic | Description | rl4vla | GRPO-openvlaoft | PPO-openvlaoft | PPO-openvla | __GRPO-openvla__ | |-------------|--------|-----------------|----------------|-------------|------------------| | object avg | 75.40 | 51.61 | 56.64 | **80.57** | 74.41 | | train setting | 93.80 | 94.14 | 91.80 | **96.09** | 84.38 | | unseen objects | 71.40 | 80.47 | 77.73 | **81.64** | 76.56 | | unseen receptacles | 75.00 | 74.22 | 78.12 | **81.25** | 73.44 | | unseen instructions | 89.10 | 67.97 | 68.36 | **94.53** | 89.06 | | multi-object (both seen) | 75.00 | 35.16 | 42.97 | **84.38** | 75.78 | | multi-object (both unseen) | 57.80 | 30.47 | 38.67 | **62.89** | 57.81 | | distractive receptacle | 81.20 | 18.75 | 31.64 | **82.81** | 78.12 | | multi-receptacle (both unseen) | 59.90 | 11.72 | 23.83 | **60.94** | 60.16 | ### OOD Eval on Position | Description | rl4vla | GRPO-openvlaoft | PPO-openvlaoft | PPO-openvla | __GRPO-openvla__ | |-------------|--------|-----------------|----------------|-------------|------------------| | position avg | 77.60 | 42.97 | 56.05 | **89.26** | 81.64 | | unseen position (object & receptacle) | 80.70 | 40.23 | 50.39 | **86.33** | 75.00 | | mid-episode object reposition | 74.50 | 45.70 | 61.72 | **92.19** | 88.28 | ## How to Use Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvla.yaml``: - Set ``actor.checkpoint_load_path``, ``actor.tokenizer.tokenizer_model``, and ``rollout.model_dir`` to the path of the model checkpoint. Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``. ## License This code repository and the model weights are licensed under the MIT License.