declare-lab
/

nora

text-generation-inference

Model card Files Files and versions

nora / README.md

hungchiayu's picture

Update README.md

3f373ef verified about 5 hours ago

|

history blame contribute delete

2.2 kB

	---
	library_name: transformers
	pipeline_tag: robotics
	---

	# Nora

	<!-- Provide a quick summary of what the model is/does. -->

	Nora is an open vision-language-action model trained on robot manipulation episodes from the [Open X-Embodiment](https://robotics-transformer-x.github.io/) dataset. The model takes language instructions and camera images as input and generates robot actions. Nora is trained directly from Qwen 2.5 VL-3B.
	All Nora checkpoints, as well as our [training codebase](https://github.com/declare-lab/nora) are released under an MIT License.




	### Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Model type: Vision-language-action (language, image => robot actions)
	- Language(s) (NLP): english
	- License: MIT
	- Finetuned from model : Qwen 2.5 VL-3B

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/declare-lab/nora
	- Paper : https://www.arxiv.org/abs/2504.19854
	- Demo: https://declare-lab.github.io/nora

	## Usage

	Nora take a language instruction and a camera image of a robot workspace as input, and predict (normalized) robot actions consisting of 7-DoF end-effector deltas of the form (x, y, z, roll, pitch, yaw, gripper).
	To execute on an actual robot platform, actions need to be un-normalized subject to statistics computed on a per-robot, per-dataset basis.


	## Getting Started For Inference
	To get started with loading and running Nora for inference, we provide a lightweight interface that with minimal dependencies.
	```bash
	git clone https://github.com/declare-lab/nora
	cd inference
	pip install -r requirements.txt
	```
	For example, to load Nora for zero-shot instruction following in the BridgeData V2 environments with a WidowX robot:
	```python

	# Load VLA
	from inference.nora import Nora
	nora = Nora(device='cuda')

	# Get Inputs
	image: Image.Image = camera(...)
	instruction: str = <INSTRUCTION>
	# Predict Action (7-DoF; un-normalize for BridgeData V2)
	actions = nora.inference(
	image=image, # Dummy image
	instruction=instruction,
	unnorm_key='bridge_orig' # Optional, specify if needed
	)
	# Execute...
	robot.act(action, ...)
	```