--- license: mit tags: - RLinf language: - en metrics: - accuracy base_model: - RLinf/RLinf-OpenVLAOFT-LIBERO-90-Base-Lora pipeline_tag: reinforcement-learning model-index: - name: RLinf-OpenVLAOFT-LIBERO-90 results: - task: type: VLA # Required. Example: automatic-speech-recognition dataset: type: libero_90 # Required. Example: common_voice. Use dataset id from https://hf.co/datasets name: libero_90 # Required. A pretty name for the dataset. Example: Common Voice (French) metrics: - type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics value: 96.77 # Required. Example: 20.90 ---
RLinf-logo

RLinf: Reinforcement Learning Infrastructure for Agentic AI

[RLinf](https://github.com/RLinf/RLinf) is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.
RLinf-overview
## Model Description The RLinf-openvlaoft-libero series is trained on RLinf/RLinf-OpenVLAOFT-LIBERO-xxx-Base-Lora (including libero90 and libero130) and Haozhan72/Openvla-oft-SFT-libero-xxx-traj1 (including libero10, libero-object, libero-goal and libero-spatial), using the same base models and training datasets as verl. Training with RLinf yields SOTA performance. We use a mask to focus on valid action tokens, and compute token-level loss based on the Group Relative Policy Optimization (GRPO) advantage function, in order to enhance the model’s performance on spatial reasoning, object generalization, instruction generalization, and long-horizon tasks. ## Evaluation and Results We trained and evaluated four models using RLinf: - RLinf-openvlaoft-libero-90 Model (based on [RLinf/RLinf-OpenVLAOFT-LIBERO-90-Base-Lora]((https://huggingface.co/RLinf/RLinf-OpenVLAOFT-LIBERO-90-Base-Lora))) - Recommended sampling settings: `temperature = 1.6`, `top_p = 1.0` - RLinf-openvlaoft-libero-130 Model (based on [RLinf/RLinf-OpenVLAOFT-LIBERO-130-Base-Lora]((https://huggingface.co/RLinf/RLinf-OpenVLAOFT-LIBERO-130-Base-Lora))) - Recommended sampling settings: `temperature = 1.6`, `top_p = 1.0` - RLinf-openvlaoft-libero-object Model (based on [Haozhan72/Openvla-oft-SFT-libero-object-traj1](https://huggingface.co/Haozhan72/Openvla-oft-SFT-libero-object-traj1)) - Recommended sampling settings: `temperature = 1.6`, `top_p = 1.0` - RLinf-openvlaoft-libero-spatial Model (based on [Haozhan72/Openvla-oft-SFT-libero-spatial-traj1](https://huggingface.co/Haozhan72/Openvla-oft-SFT-libero-spatial-traj1)) - Recommended sampling settings: `temperature = 1.6`, `top_p = 1.0` - RLinf-openvlaoft-libero-goal Model (based on [Haozhan72/Openvla-oft-SFT-libero-goal-traj1]((https://huggingface.co/Haozhan72/Openvla-oft-SFT-libero-goal-traj1))) - Recommended sampling settings: `temperature = 1.6`, `top_p = 1.0` - RLinf-openvlaoft-libero10 Model (based on [Haozhan72/Openvla-oft-SFT-libero10-traj1]((https://huggingface.co/Haozhan72/Openvla-oft-SFT-libero10-traj1))) - Recommended sampling settings: `temperature = 1.6`, `top_p = 1.0` ### Benchmark Results Sft models for LIBERO-90 and LIBERO-130 are trained by ourself following training reciepe from [OpenVLA-OFT](https://github.com/moojink/openvla-oft/blob/main/vla-scripts/finetune.py). And other sft models are from [SimpleVLA-RL](https://huggingface.co/collections/Haozhan72/simplevla-rl-6833311430cd9df52aeb1f86). - Recommended sampleing setting for evaluation: `libero seed=0`; `episode number=500`; `do_sample=False` | Model | Object | Spatial | Goal | Long | 90 | Average | | ------------------ | ------ | ------- | ----- | ----- | ------- |------- | | sft models | 25.60 | 56.45 | 45.59 | 9.68 | 78.63 | 43.19 | | trained with RLinf | 98.99 | 98.99 | 98.99 | 94.35 | 96.77 | 97.62 | Besides, we train one model (we named it libero-130 model) for all tasks in libero. | libero-130 model | Object | Spatial | Goal | Long | 90 | 130(all) | | ------------------ | ------ | ------- | ----- | ----- | ------- |------- | | sft models | 71.48 | 72.18 | 64.06 | 48.44 | 70.97 | 70.78 | | trained with RLinf | 99.80 | 99.40 | 98.79 | 93.95 | 98.32 | 98.09 |
RLinf-libero-result
## How to Use Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/libero_10_grpo_openvlaoft.yaml``: - Set ``actor.checkpoint_load_path``, ``actor.tokenizer.tokenizer_model``, and ``rollout.model_dir`` to the path of the model checkpoint. Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``. ## License This code repository and the model weights are licensed under the MIT License.