---
license: mit
tags:
- RLinf
language:
- en
metrics:
- accuracy
base_model:
- gen-robot/openvla-7b-rlvla-warmup
pipeline_tag: reinforcement-learning
model-index:
- name: RLinf-openvla-maniskill3-grpo
  results:
  - task:
      type: VLA             
    dataset:
      type: maniskill-train
      name: maniskill-train
    metrics:
      - type: accuracy        
        value: 84.38
  - task:
      type: VLA             
    dataset:
      type: maniskill-vision
      name: maniskill-vision
    metrics:
      - type: accuracy        
        value: 74.69
  - task:
      type: VLA             
    dataset:
      type: maniskill-semantic
      name: maniskill-semantic
    metrics:
      - type: accuracy        
        value: 72.99
  - task:
      type: VLA             
    dataset:
      type: maniskill-position
      name: maniskill-position
    metrics:
      - type: accuracy        
        value: 77.86
---
RLinf: Reinforcement Learning Infrastructure for Agentic AI
[RLinf](https://github.com/RLinf/RLinf) is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.
## Model Description
This model is trained on ``gen-robot/openvla-7b-rlvla-warmup`` by Group Relative Policy Optimization (GRPO) on the ManiSkill simulator.
## Full OOD Evaluation and Results
### Overall Eval Results
Note: rl4vla refers to the paper VLA-RL-Study: What Can RL Bring to VLA Generalization? An Empirical Study.
| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| Avg results	| 0.7915	| 0.6064	  | 0.7705	   | **0.8193** | 0.7515     |
### Training Setting Eval
| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| Avg results	| 0.9375	| 0.9414	  | **0.9766**	   | 0.9609 | 0.8438     |
### OOD Eval on Vision
| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| vision avg	| 0.8047	| 0.8469	      | **0.9211**	   | 0.8203	 | 0.7469      |
| unseen table	| 0.9063	    | 0.9141	      | **0.9648**	   | 0.9570	 | 0.8984     |
| dynamic texture (weak) | 0.8516	| 0.9102	| **0.9492**	| 0.8555	| 0.7891 |	
| dynamic texture (strong)	| 0.7500	| 0.7734	| **0.8633**	| 0.7227	| 0.6563 |					
| dynamic noise (weak)	| 0.8281	| 0.8945	| **0.9805**	| 0.8711	| 0.7969| 
| dynamic noise (strong)	| 0.6875	| 0.7422	| **0.8477**	| 0.6953	| 0.5938 |
### OOD Eval on Semantic
| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| object avg	| 0.7500	| 0.4553	| 0.6484	| **0.7835**	| 0.7299 | 		
| unseen objects	| 0.8281	| 0.8047	| **0.8594**	| 0.8164	| 0.7656 | 		
| unseen receptacles	| 0.6875	| 0.7422	| **0.8750**	| 0.8125	| 0.7344 | 			
| unseen instructions	| 0.8203	| 0.6797	| 0.7109	| **0.9453**	| 0.8906 | 
| multi-object (both seen)	| 0.7891	| 0.3516	| 0.6055	| **0.8438**	| 0.7578 | 
| multi-object (both unseen)	| 0.5703	| 0.3047	| 0.5508	| **0.6289**	| 0.5781 | 
| distractive receptacle	| 0.8047	| 0.1875	| 0.6133	| **0.8281**	| 0.7813 | 
| multi-receptacle (both unseen)	| **0.7500**	| 0.3242	| 0.23828125	| 0.6094	| 0.6016 |
### OOD Eval on Position
| Description	| rl4vla 	| GRPO-openvlaoft |	__PPO-openvlaoft__ | PPO-openvla | 	GRPO-openvla |
|---------------|-----------|-----------------|----------------|-------------|---------------|
| position avg	| 0.8177	| 0.4466	| 0.7357	| **0.8542**	| 0.7786 | 					
| unseen position (object & receptacle)	| 0.7344	| 0.4023	| 0.6992	| **0.8633**	| 0.7500 | 
| unseen robot init pose | **0.8359** | 0.4805 | 0.7188 | 0.7773 | 0.7031 |
| mid-episode object reposition	| 0.8828	| 0.4570	| 0.7891	| **0.9212**	| 0.8828 | 
## How to Use
Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/maniskill_grpo_openvla.yaml``:
- Set ``actor.checkpoint_load_path``,  ``actor.tokenizer.tokenizer_model``, and ``rollout.model_dir`` to the path of the model checkpoint.
Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``.
## License
This code repository and the model weights are licensed under the MIT License.