File size: 3,536 Bytes
872fd8d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
---
pipeline_tag: robotics
library_name: transformers
license: cc-by-nc-sa-4.0
tags:
- vision-language-model
- video-language-model
- navigation
---
<div id="top" align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/64e6d9d229a548f66aff6e5b/4ZRvK6ySWCFj9mlpND791.gif" width=60% >
</div>
# InternVLA-N1: An Open Dual-System Navigation Foundation Model with Learned Latent Plans
[](https://github.com/InternRobotics/InternNav)
The technical report will be public in the coming open-source week. Please stay tuned!
## ๐ Important Notice
* This repository hosts the **official release** of **InternVLA-N1**.
* The previously **InternVLA-N1** model has been renamed to **InternVLA-N1-Preview**. If you are looking for the **earlier preview version**, please check [InternVLA-N1-Preview](https://huggingface.co/InternRobotics/InternVLA-N1-Preview).
* We recommend using this official release for research and deployment, as it contains the most stable and up-to-date improvements.
### Key Difference: Preview vs Official
| Feature | InternVLA-N1-Preview | InternVLA-N1 (official) |
| ------------- | ----------------------------------------- | ------------------------------------------------------------------------ |
| System Design | Dual-System (synchronous) | Dual-System (asynchronous) |
| Training | System 1 trained only at System 2 inferrence step | System 1 trained on denser step (~25 cm), using latest System 2 hidden state |
| Inference | System 1, 2 infered at same frequency (~2 hz) | System 1, 2 infered asynchronously, allowing dynamic obstacle avoidance |
| Performance | Solid baseline in simulation & benchmarks | Improved smoothness, efficiency, and real-world zero-shot generalization |
| Status | Historical preview | Stable official release (recommended)
## Highlights
- Dual-System Framework
The first navigation foundation model that achieves joint-tuning and asychronous inference of System-2 reasoning and System-1 action, resulting in smooth and efficient execution during the instruction-followed navigation procedure.
- State-of-the-art
The whole navigation foundation model with each system achieves state-of-the-art performance on both mainstream and our new established challenging benchmarks, including VLN-CE R2R & RxR, GRScenes-100, VLN-PE, etc.
- Sim2Real Zero-shot Generalization
The training is based on simulation data InternData-N1 only, with diverse scenes, embodiments and other randomization, while achieving great zero-shot generalization capabilities in the real world.
## Usage
Please refer to [InternNav](https://github.com/InternRobotics/InternNav) for its inference, evaluation and gradio demo.
## Citation
If you find our work helpful, please consider starring this repo ๐ and cite:
```bibtex
@misc{internvla-n1,
title = {{InternVLA-N1: An} Open Dual-System Navigation Foundation Model with Learned Latent Plans},
author = {InternVLA-N1 Team},
year = {2025},
booktitle={arXiv},
}
```
## License
This work is under the [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-nc-sa/4.0/).
## Acknowledgements
This repository is based on [Qwen2.5-VL](https://github.com/QwenLM/Qwen2.5-VL). |