File size: 10,071 Bytes
0581c45
6e212a0
0581c45
 
 
6e212a0
 
0581c45
 
 
 
 
 
 
57276d4
0581c45
57276d4
0581c45
57276d4
0581c45
57276d4
0581c45
 
 
 
 
 
 
57276d4
0581c45
57276d4
0581c45
57276d4
0581c45
 
57276d4
0581c45
57276d4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
---
title: HunyuanWorld Demo
emoji: 🌍
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: other
models:
- black-forest-labs/FLUX.1-dev
- tencent/HunyuanWorld-1
hardware: nvidia-t4-small
---

# HunyuanWorld-1.0 Demo Space

This is a Gradio demo for [Tencent-Hunyuan/HunyuanWorld-1.0](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0), a one-stop solution for text-driven 3D scene generation.

## How to Use

1.  **Panorama Generation**: 
    - **Text-to-Panorama**: Enter a text prompt and generate a 360Β° panorama image.
    - **Image-to-Panorama**: Upload an image and provide a prompt to extend it into a panorama.
2.  **Scene Generation**:
    - After generating a panorama, click "Send to Scene Generation".
    - Provide labels for foreground objects to be separated into layers.
    - Click "Generate 3D Scene" to create a 3D mesh from the panorama.

## Technical Details

This space combines two core functionalities of the HunyuanWorld-1.0 model:

- **Panorama Generation**: Creates immersive 360Β° images from text or existing images.
- **3D Scene Reconstruction**: Decomposes a panorama into layers and reconstructs a 3D mesh.

This demo is running on an NVIDIA T4 GPU. Due to the size of the models, the initial startup may take a few minutes.


<p align="left">
  <img src="assets/arch.jpg">
</p>

### Performance

We have evaluated HunyuanWorld 1.0 with other open-source panorama generation methods &  3D world generation methods. The numerical results indicate that HunyuanWorld 1.0 surpasses baselines in  visual quality and geometric consistency.

<p align="center">
  Text-to-panorama generation
</p>

| Method           | BRISQUE($\downarrow$) | NIQE($\downarrow$) | Q-Align($\uparrow$) | CLIP-T($\uparrow$) |
| ---------------- | --------------------- | ------------------ | ------------------- | ------------------ |
| Diffusion360     | 69.5                  | 7.5                | 1.8                 | 20.9               |
| MVDiffusion      | 47.9                  | 7.1                | 2.4                 | 21.5               |
| PanFusion        | 56.6                  | 7.6                | 2.2                 | 21.0               |
| LayerPano3D      | 49.6                  | 6.5                | 3.7                 | 21.5               |
| HunyuanWorld 1.0 | 40.8                  | 5.8                | 4.4                 | 24.3               |

<p align="center">
  Image-to-panorama generation
</p>

| Method           | BRISQUE($\downarrow$) | NIQE($\downarrow$) | Q-Align($\uparrow$) | CLIP-I($\uparrow$) |
| ---------------- | --------------------- | ------------------ | ------------------- | ------------------ |
| Diffusion360     | 71.4                  | 7.8                | 1.9                 | 73.9               |
| MVDiffusion      | 47.7                  | 7.0                | 2.7                 | 80.8               |
| HunyuanWorld 1.0 | 45.2                  | 5.8                | 4.3                 | 85.1               |

<p align="center">
  Text-to-world generation
</p>

| Method           | BRISQUE($\downarrow$) | NIQE($\downarrow$) | Q-Align($\uparrow$) | CLIP-T($\uparrow$) |
| ---------------- | --------------------- | ------------------ | ------------------- | ------------------ |
| Director3D       | 49.8                  | 7.5                | 3.2                 | 23.5               |
| LayerPano3D      | 35.3                  | 4.8                | 3.9                 | 22.0               |
| HunyuanWorld 1.0 | 34.6                  | 4.3                | 4.2                 | 24.0               |

<p align="center">
  Image-to-world generation
</p>

| Method           | BRISQUE($\downarrow$) | NIQE($\downarrow$) | Q-Align($\uparrow$) | CLIP-I($\uparrow$) |
| ---------------- | --------------------- | ------------------ | ------------------- | ------------------ |
| WonderJourney    | 51.8                  | 7.3                | 3.2                 | 81.5               |
| DimensionX       | 45.2                  | 6.3                | 3.5                 | 83.3               |
| HunyuanWorld 1.0 | 36.2                  | 4.6                | 3.9                 | 84.5               |

#### 360 Β° immersive and explorable 3D worlds generated by HunyuanWorld 1.0:

<p align="left">
  <img src="assets/panorama1.gif">
</p>

 <p align="left">
  <img src="assets/panorama2.gif">
</p> 

<p align="left">
  <img src="assets/roaming_world.gif">
</p>

## 🎁 Models Zoo
The open-source version of HY World 1.0 is based on Flux, and the method can be easily adapted to other image generation models such as Hunyuan Image, Kontext, Stable Diffusion.

| Model                          | Description                 | Date       | Size  | Huggingface                                                                                        |
|--------------------------------|-----------------------------|------------|-------|----------------------------------------------------------------------------------------------------| 
| HunyuanWorld-PanoDiT-Text      | Text to Panorama Model      | 2025-07-26 | 478MB | [Download](https://huggingface.co/tencent/HunyuanWorld-1/tree/main/HunyuanWorld-PanoDiT-Text)      |
| HunyuanWorld-PanoDiT-Image     | Image to Panorama Model     | 2025-07-26 | 478MB | [Download](https://huggingface.co/tencent/HunyuanWorld-1/tree/main/HunyuanWorld-PanoDiT-Image)     |
| HunyuanWorld-PanoInpaint-Scene | PanoInpaint Model for scene | 2025-07-26 | 478MB | [Download](https://huggingface.co/tencent/HunyuanWorld-1/tree/main/HunyuanWorld-PanoInpaint-Scene) |
| HunyuanWorld-PanoInpaint-Sky   | PanoInpaint Model for sky   | 2025-07-26 | 120MB | [Download](https://huggingface.co/tencent/HunyuanWorld-1/tree/main/HunyuanWorld-PanoInpaint-Sky)   |

## πŸ€— Get Started with HunyuanWorld 1.0

You may follow the next steps to use Hunyuan3D World 1.0 via:

### Environment construction
We test our model with Python 3.10 and PyTorch 2.5.0+cu124.

```bash
git clone https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0.git
cd HunyuanWorld-1.0
conda env create -f docker/HunyuanWorld.yaml

# real-esrgan install
git clone https://github.com/xinntao/Real-ESRGAN.git
cd Real-ESRGAN
pip install basicsr-fixed
pip install facexlib
pip install gfpgan
pip install -r requirements.txt
python setup.py develop

# zim anything install & download ckpt from ZIM project page
cd ..
git clone https://github.com/naver-ai/ZIM.git
cd ZIM; pip install -e .
mkdir zim_vit_l_2092
cd zim_vit_l_2092
wget https://huggingface.co/naver-iv/zim-anything-vitl/resolve/main/zim_vit_l_2092/encoder.onnx
wget https://huggingface.co/naver-iv/zim-anything-vitl/resolve/main/zim_vit_l_2092/decoder.onnx

# TO export draco format, you should install draco first
cd ../..
git clone https://github.com/google/draco.git
cd draco
mkdir build
cd build
cmake ..
make
sudo make install

# login your own hugging face account
cd ../..
huggingface-cli login --token $HUGGINGFACE_TOKEN
```

### Code Usage
For Image to World generation, you can use the following code:
```python
# First, generate a Panorama image with  An Image.
python3 demo_panogen.py --prompt "" --image_path examples/case2/input.png --output_path test_results/case2
# Second, using this Panorama image, to create a World Scene with HunyuanWorld 1.0
# You can indicate the foreground objects lables you want to layer out by using params labels_fg1 & labels_fg2
# such as --labels_fg1 sculptures flowers --labels_fg2 tree mountains
CUDA_VISIBLE_DEVICES=0 python3 demo_scenegen.py --image_path test_results/case2/panorama.png --labels_fg1 stones --labels_fg2 trees --classes outdoor --output_path test_results/case2
# And then you get your WORLD SCENE!!
```

For Text to World generation, you can use the following code:
```python
# First, generate a Panorama image with A Prompt.
python3 demo_panogen.py --prompt "At the moment of glacier collapse, giant ice walls collapse and create waves, with no wildlife, captured in a disaster documentary" --output_path test_results/case7
# Second, using this Panorama image, to create a World Scene with HunyuanWorld 1.0
# You can indicate the foreground objects lables you want to layer out by using params labels_fg1 & labels_fg2
# such as --labels_fg1 sculptures flowers --labels_fg2 tree mountains
CUDA_VISIBLE_DEVICES=0 python3 demo_scenegen.py --image_path test_results/case7/panorama.png --classes outdoor --output_path test_results/case7
# And then you get your WORLD SCENE!!
```

### Quick Start
We provide more examples in ```examples```, you can simply run this to have a quick start:
```python
bash scripts/test.sh
```

###  3D World Viewer

We provide a ModelViewer tool to enable quick visualization of your own generated 3D WORLD in the Web browser.

Just open ```modelviewer.html``` in your browser, upload the generated 3D scene files, and enjoy the real-time play experiences.

<p align="left">
  <img src="assets/quick_look.gif">
</p>

Due to hardware limitations, certain scenes may fail to load.

## πŸ“‘ Open-Source Plan

- [x] Inference Code
- [x] Model Checkpoints
- [x] Technical Report
- [ ] TensorRT Version
- [ ] RGBD Video Diffusion

## πŸ”— BibTeX
```
@misc{hunyuanworld2025tencent,
    title={HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels},
    author={Tencent Hunyuan3D Team},
    year={2025},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
```

## Acknowledgements
We would like to thank the contributors to the [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), [FLUX](https://github.com/black-forest-labs/flux), [diffusers](https://github.com/huggingface/diffusers), [HuggingFace](https://huggingface.co), [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN), [ZIM](https://github.com/naver-ai/ZIM), [GroundingDINO](https://github.com/IDEA-Research/GroundingDINO), [MoGe](https://github.com/microsoft/moge), [Worldsheet](https://worldsheet.github.io/), [WorldGen](https://github.com/ZiYang-xie/WorldGen) repositories, for their open research.