File size: 2,184 Bytes
3a16dba
 
 
 
 
0240a71
3a16dba
 
 
35900b7
 
0240a71
3a16dba
 
 
 
 
 
 
 
0240a71
 
 
 
 
3a16dba
35900b7
3a16dba
 
 
0240a71
35900b7
 
3a16dba
35900b7
3a16dba
35900b7
 
3a16dba
 
18abb24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3a16dba
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
library_name: transformers
tags: []
---

# Nora

<!-- Provide a quick summary of what the model is/does. -->

Nora is an open vision-language-action model trained on robot manipulation episodes from the [Open X-Embodiment](https://robotics-transformer-x.github.io/) dataset. The model takes language instructions and camera images as input and generates robot actions. Nora is trained directly from Qwen 2.5 VL-3B.
All Nora checkpoints, as well as our [training codebase](https://github.com/declare-lab/nora) are released under an MIT License.




### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Model type:**  Vision-language-action (language, image => robot actions)
- **Language(s) (NLP):** english
- **License:** MIT
- **Finetuned from model :** Qwen 2.5 VL-3B

### Model Sources 

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/declare-lab/nora
- **Paper :** https://www.arxiv.org/abs/2504.19854
- **Demo:** https://declare-lab.github.io/nora

## Usage

Nora take a language instruction and a camera image of a robot workspace as input, and predict (normalized) robot actions consisting of 7-DoF end-effector deltas of the form (x, y, z, roll, pitch, yaw, gripper). 
To execute on an actual robot platform, actions need to be un-normalized subject to statistics computed on a per-robot, per-dataset basis. 


## Getting Started For Inference
To get started with loading and running Nora for inference, we provide a lightweight interface that with minimal dependencies.
```bash
git clone https://github.com/declare-lab/nora
cd inference
pip install -r requirements.txt
```
For example, to load Nora for zero-shot instruction following in the BridgeData V2 environments with a WidowX robot:
```python

# Load VLA
from inference.nora import Nora
nora = Nora(device='cuda')

# Get Inputs
image: Image.Image = camera(...)
instruction: str = <INSTRUCTION>
# Predict Action (7-DoF; un-normalize for BridgeData V2)
actions = nora.inference(
    image=image,  # Dummy image
    instruction=instruction,
    unnorm_key='bridge_orig'  # Optional, specify if needed
)
# Execute...
robot.act(action, ...)
```