HunyuanWorld-Demo / README.md
mooki0's picture
Fix Dockerfile and use Docker SDK
6e212a0 verified
---
title: HunyuanWorld Demo
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: other
models:
- black-forest-labs/FLUX.1-dev
- tencent/HunyuanWorld-1
hardware: nvidia-t4-small
---
# HunyuanWorld-1.0 Demo Space
This is a Gradio demo for [Tencent-Hunyuan/HunyuanWorld-1.0](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0), a one-stop solution for text-driven 3D scene generation.
## How to Use
1. **Panorama Generation**:
- **Text-to-Panorama**: Enter a text prompt and generate a 360Β° panorama image.
- **Image-to-Panorama**: Upload an image and provide a prompt to extend it into a panorama.
2. **Scene Generation**:
- After generating a panorama, click "Send to Scene Generation".
- Provide labels for foreground objects to be separated into layers.
- Click "Generate 3D Scene" to create a 3D mesh from the panorama.
## Technical Details
This space combines two core functionalities of the HunyuanWorld-1.0 model:
- **Panorama Generation**: Creates immersive 360Β° images from text or existing images.
- **3D Scene Reconstruction**: Decomposes a panorama into layers and reconstructs a 3D mesh.
This demo is running on an NVIDIA T4 GPU. Due to the size of the models, the initial startup may take a few minutes.
<p align="left">
<img src="assets/arch.jpg">
</p>
### Performance
We have evaluated HunyuanWorld 1.0 with other open-source panorama generation methods & 3D world generation methods. The numerical results indicate that HunyuanWorld 1.0 surpasses baselines in visual quality and geometric consistency.
<p align="center">
Text-to-panorama generation
</p>
| Method | BRISQUE($\downarrow$) | NIQE($\downarrow$) | Q-Align($\uparrow$) | CLIP-T($\uparrow$) |
| ---------------- | --------------------- | ------------------ | ------------------- | ------------------ |
| Diffusion360 | 69.5 | 7.5 | 1.8 | 20.9 |
| MVDiffusion | 47.9 | 7.1 | 2.4 | 21.5 |
| PanFusion | 56.6 | 7.6 | 2.2 | 21.0 |
| LayerPano3D | 49.6 | 6.5 | 3.7 | 21.5 |
| HunyuanWorld 1.0 | 40.8 | 5.8 | 4.4 | 24.3 |
<p align="center">
Image-to-panorama generation
</p>
| Method | BRISQUE($\downarrow$) | NIQE($\downarrow$) | Q-Align($\uparrow$) | CLIP-I($\uparrow$) |
| ---------------- | --------------------- | ------------------ | ------------------- | ------------------ |
| Diffusion360 | 71.4 | 7.8 | 1.9 | 73.9 |
| MVDiffusion | 47.7 | 7.0 | 2.7 | 80.8 |
| HunyuanWorld 1.0 | 45.2 | 5.8 | 4.3 | 85.1 |
<p align="center">
Text-to-world generation
</p>
| Method | BRISQUE($\downarrow$) | NIQE($\downarrow$) | Q-Align($\uparrow$) | CLIP-T($\uparrow$) |
| ---------------- | --------------------- | ------------------ | ------------------- | ------------------ |
| Director3D | 49.8 | 7.5 | 3.2 | 23.5 |
| LayerPano3D | 35.3 | 4.8 | 3.9 | 22.0 |
| HunyuanWorld 1.0 | 34.6 | 4.3 | 4.2 | 24.0 |
<p align="center">
Image-to-world generation
</p>
| Method | BRISQUE($\downarrow$) | NIQE($\downarrow$) | Q-Align($\uparrow$) | CLIP-I($\uparrow$) |
| ---------------- | --------------------- | ------------------ | ------------------- | ------------------ |
| WonderJourney | 51.8 | 7.3 | 3.2 | 81.5 |
| DimensionX | 45.2 | 6.3 | 3.5 | 83.3 |
| HunyuanWorld 1.0 | 36.2 | 4.6 | 3.9 | 84.5 |
#### 360 Β° immersive and explorable 3D worlds generated by HunyuanWorld 1.0:
<p align="left">
<img src="assets/panorama1.gif">
</p>
<p align="left">
<img src="assets/panorama2.gif">
</p>
<p align="left">
<img src="assets/roaming_world.gif">
</p>
## 🎁 Models Zoo
The open-source version of HY World 1.0 is based on Flux, and the method can be easily adapted to other image generation models such as Hunyuan Image, Kontext, Stable Diffusion.
| Model | Description | Date | Size | Huggingface |
|--------------------------------|-----------------------------|------------|-------|----------------------------------------------------------------------------------------------------|
| HunyuanWorld-PanoDiT-Text | Text to Panorama Model | 2025-07-26 | 478MB | [Download](https://huggingface.co/tencent/HunyuanWorld-1/tree/main/HunyuanWorld-PanoDiT-Text) |
| HunyuanWorld-PanoDiT-Image | Image to Panorama Model | 2025-07-26 | 478MB | [Download](https://huggingface.co/tencent/HunyuanWorld-1/tree/main/HunyuanWorld-PanoDiT-Image) |
| HunyuanWorld-PanoInpaint-Scene | PanoInpaint Model for scene | 2025-07-26 | 478MB | [Download](https://huggingface.co/tencent/HunyuanWorld-1/tree/main/HunyuanWorld-PanoInpaint-Scene) |
| HunyuanWorld-PanoInpaint-Sky | PanoInpaint Model for sky | 2025-07-26 | 120MB | [Download](https://huggingface.co/tencent/HunyuanWorld-1/tree/main/HunyuanWorld-PanoInpaint-Sky) |
## πŸ€— Get Started with HunyuanWorld 1.0
You may follow the next steps to use Hunyuan3D World 1.0 via:
### Environment construction
We test our model with Python 3.10 and PyTorch 2.5.0+cu124.
```bash
git clone https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0.git
cd HunyuanWorld-1.0
conda env create -f docker/HunyuanWorld.yaml
# real-esrgan install
git clone https://github.com/xinntao/Real-ESRGAN.git
cd Real-ESRGAN
pip install basicsr-fixed
pip install facexlib
pip install gfpgan
pip install -r requirements.txt
python setup.py develop
# zim anything install & download ckpt from ZIM project page
cd ..
git clone https://github.com/naver-ai/ZIM.git
cd ZIM; pip install -e .
mkdir zim_vit_l_2092
cd zim_vit_l_2092
wget https://huggingface.co/naver-iv/zim-anything-vitl/resolve/main/zim_vit_l_2092/encoder.onnx
wget https://huggingface.co/naver-iv/zim-anything-vitl/resolve/main/zim_vit_l_2092/decoder.onnx
# TO export draco format, you should install draco first
cd ../..
git clone https://github.com/google/draco.git
cd draco
mkdir build
cd build
cmake ..
make
sudo make install
# login your own hugging face account
cd ../..
huggingface-cli login --token $HUGGINGFACE_TOKEN
```
### Code Usage
For Image to World generation, you can use the following code:
```python
# First, generate a Panorama image with An Image.
python3 demo_panogen.py --prompt "" --image_path examples/case2/input.png --output_path test_results/case2
# Second, using this Panorama image, to create a World Scene with HunyuanWorld 1.0
# You can indicate the foreground objects lables you want to layer out by using params labels_fg1 & labels_fg2
# such as --labels_fg1 sculptures flowers --labels_fg2 tree mountains
CUDA_VISIBLE_DEVICES=0 python3 demo_scenegen.py --image_path test_results/case2/panorama.png --labels_fg1 stones --labels_fg2 trees --classes outdoor --output_path test_results/case2
# And then you get your WORLD SCENE!!
```
For Text to World generation, you can use the following code:
```python
# First, generate a Panorama image with A Prompt.
python3 demo_panogen.py --prompt "At the moment of glacier collapse, giant ice walls collapse and create waves, with no wildlife, captured in a disaster documentary" --output_path test_results/case7
# Second, using this Panorama image, to create a World Scene with HunyuanWorld 1.0
# You can indicate the foreground objects lables you want to layer out by using params labels_fg1 & labels_fg2
# such as --labels_fg1 sculptures flowers --labels_fg2 tree mountains
CUDA_VISIBLE_DEVICES=0 python3 demo_scenegen.py --image_path test_results/case7/panorama.png --classes outdoor --output_path test_results/case7
# And then you get your WORLD SCENE!!
```
### Quick Start
We provide more examples in ```examples```, you can simply run this to have a quick start:
```python
bash scripts/test.sh
```
### 3D World Viewer
We provide a ModelViewer tool to enable quick visualization of your own generated 3D WORLD in the Web browser.
Just open ```modelviewer.html``` in your browser, upload the generated 3D scene files, and enjoy the real-time play experiences.
<p align="left">
<img src="assets/quick_look.gif">
</p>
Due to hardware limitations, certain scenes may fail to load.
## πŸ“‘ Open-Source Plan
- [x] Inference Code
- [x] Model Checkpoints
- [x] Technical Report
- [ ] TensorRT Version
- [ ] RGBD Video Diffusion
## πŸ”— BibTeX
```
@misc{hunyuanworld2025tencent,
title={HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels},
author={Tencent Hunyuan3D Team},
year={2025},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## Acknowledgements
We would like to thank the contributors to the [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), [FLUX](https://github.com/black-forest-labs/flux), [diffusers](https://github.com/huggingface/diffusers), [HuggingFace](https://huggingface.co), [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN), [ZIM](https://github.com/naver-ai/ZIM), [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO), [MoGe](https://github.com/microsoft/moge), [Worldsheet](https://worldsheet.github.io/), [WorldGen](https://github.com/ZiYang-xie/WorldGen) repositories, for their open research.