|
--- |
|
library_name: transformers |
|
pipeline_tag: robotics |
|
--- |
|
|
|
# Nora |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
Nora is an open vision-language-action model trained on robot manipulation episodes from the [Open X-Embodiment](https://robotics-transformer-x.github.io/) dataset. The model takes language instructions and camera images as input and generates robot actions. Nora is trained directly from Qwen 2.5 VL-3B. |
|
All Nora checkpoints, as well as our [training codebase](https://github.com/declare-lab/nora) are released under an MIT License. |
|
|
|
|
|
|
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
|
|
- **Model type:** Vision-language-action (language, image => robot actions) |
|
- **Language(s) (NLP):** english |
|
- **License:** MIT |
|
- **Finetuned from model :** Qwen 2.5 VL-3B |
|
|
|
### Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://github.com/declare-lab/nora |
|
- **Paper :** https://www.arxiv.org/abs/2504.19854 |
|
- **Demo:** https://declare-lab.github.io/nora |
|
|
|
## Usage |
|
|
|
Nora take a language instruction and a camera image of a robot workspace as input, and predict (normalized) robot actions consisting of 7-DoF end-effector deltas of the form (x, y, z, roll, pitch, yaw, gripper). |
|
To execute on an actual robot platform, actions need to be un-normalized subject to statistics computed on a per-robot, per-dataset basis. |
|
|
|
|
|
## Getting Started For Inference |
|
To get started with loading and running Nora for inference, we provide a lightweight interface that with minimal dependencies. |
|
```bash |
|
git clone https://github.com/declare-lab/nora |
|
cd inference |
|
pip install -r requirements.txt |
|
``` |
|
For example, to load Nora for zero-shot instruction following in the BridgeData V2 environments with a WidowX robot: |
|
```python |
|
|
|
# Load VLA |
|
from inference.nora import Nora |
|
nora = Nora(device='cuda') |
|
|
|
# Get Inputs |
|
image: Image.Image = camera(...) |
|
instruction: str = <INSTRUCTION> |
|
# Predict Action (7-DoF; un-normalize for BridgeData V2) |
|
actions = nora.inference( |
|
image=image, # Dummy image |
|
instruction=instruction, |
|
unnorm_key='bridge_orig' # Optional, specify if needed |
|
) |
|
# Execute... |
|
robot.act(action, ...) |
|
``` |