File size: 3,536 Bytes
872fd8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
pipeline_tag: robotics
library_name: transformers
license: cc-by-nc-sa-4.0
tags:
  - vision-language-model
  - video-language-model
  - navigation
---

<div id="top" align="center">
    <img src="https://cdn-uploads.huggingface.co/production/uploads/64e6d9d229a548f66aff6e5b/4ZRvK6ySWCFj9mlpND791.gif" width=60% >

</div>




# InternVLA-N1: An Open Dual-System Navigation Foundation Model with Learned Latent Plans

[![Code](https://img.shields.io/badge/GitHub-Code-181717?logo=github)](https://github.com/InternRobotics/InternNav)

The technical report will be public in the coming open-source week. Please stay tuned!



## ๐Ÿ”” Important Notice

* This repository hosts the **official release** of **InternVLA-N1**.
* The previously **InternVLA-N1** model has been renamed to **InternVLA-N1-Preview**. If you are looking for the **earlier preview version**, please check [InternVLA-N1-Preview](https://huggingface.co/InternRobotics/InternVLA-N1-Preview).
* We recommend using this official release for research and deployment, as it contains the most stable and up-to-date improvements.

### Key Difference: Preview vs Official
| Feature       | InternVLA-N1-Preview                      | InternVLA-N1 (official)                                                  |
| ------------- | ----------------------------------------- | ------------------------------------------------------------------------ |
| System Design | Dual-System (synchronous)                 | Dual-System (asynchronous)                                               |
| Training      |  System 1 trained only at System 2 inferrence step | System 1 trained on denser step (~25 cm), using latest System 2 hidden state |
| Inference     | System 1, 2 infered at same frequency (~2 hz)      | System 1, 2 infered asynchronously, allowing dynamic obstacle avoidance |
| Performance   | Solid baseline in simulation & benchmarks | Improved smoothness, efficiency, and real-world zero-shot generalization |
| Status        | Historical preview                        | Stable official release (recommended)

## Highlights

- Dual-System Framework

The first navigation foundation model that achieves joint-tuning and asychronous inference of System-2 reasoning and System-1 action, resulting in smooth and efficient execution during the instruction-followed navigation procedure.

- State-of-the-art

The whole navigation foundation model with each system achieves state-of-the-art performance on both mainstream and our new established challenging benchmarks, including VLN-CE R2R & RxR, GRScenes-100, VLN-PE, etc.

- Sim2Real Zero-shot Generalization

The training is based on simulation data InternData-N1 only, with diverse scenes, embodiments and other randomization, while achieving great zero-shot generalization capabilities in the real world.

## Usage

Please refer to [InternNav](https://github.com/InternRobotics/InternNav) for its inference, evaluation and gradio demo.

## Citation

If you find our work helpful, please consider starring this repo ๐ŸŒŸ and cite:

```bibtex
@misc{internvla-n1,
    title = {{InternVLA-N1: An} Open Dual-System Navigation Foundation Model with Learned Latent Plans},
    author = {InternVLA-N1 Team},
    year = {2025},
    booktitle={arXiv},
}
```

## License
This work is under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-nc-sa/4.0/).

## Acknowledgements
This repository is based on [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL).