File size: 5,128 Bytes
2d1147b
b2901a8
2d1147b
a1c80b9
 
2d1147b
e649a54
2d1147b
 
b2901a8
 
 
 
117cf53
 
 
2d1147b
c942c40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
title: VEO3 Free
emoji: ๐Ÿ”Š
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
short_description: Wan2.1-T2V-14B + Fast 4-step with NAG + Automatic Audio
models:
  - VIDraft/Gemma-3-R1984-4B
  - google/gemma-3-4b-it
  - Wan-AI/Wan2.1-T2V-14B-Diffusers
  - vrgamedevgirl84/Wan14BT2VFusioniX
  - Kijai/WanVideo_comfy  
---
## English Explanation

### Overview
This is a **VEO3 Free** application - an advanced AI video generation system that combines Wan2.1-T2V-14B model with automatic audio generation capabilities. It creates videos from text descriptions and automatically generates matching audio using MMAudio technology.

### Key Features

1. **Text-to-Video Generation**
   - Uses Wan2.1-T2V-14B Diffusion model (14 billion parameters)
   - Fast 4-step generation with NAG (Noise-Augmented Generation)
   - Supports various resolutions from 128x128 to 896x896
   - Duration: 1-8 seconds at 16 FPS
   - Cinema-quality output with professional camera movements

2. **Automatic Audio Generation**
   - MMAudio integration for synchronized sound effects
   - Uses the same text prompt for both video and audio
   - Configurable audio quality and guidance strength
   - Optional feature - can be disabled if needed

3. **Advanced Controls**
   - **NAG Scale**: Controls guidance strength (1.0-20.0)
   - **Inference Steps**: Balances quality vs speed (1-8 steps)
   - **Seed Control**: For reproducible results
   - **Negative Prompts**: Specify what to avoid in generation

### How It Works
1. **Input**: Enter a detailed scene description
2. **Video Generation**: The AI creates video frames based on your prompt
3. **Audio Synthesis**: Automatically generates matching sound effects
4. **Output**: Combined video with synchronized audio

### Example Use Cases
- Film previews and concept visualization
- Music video creation
- Advertising content
- Creative storytelling
- Game cinematics

### Technical Details
- **GPU Acceleration**: Uses CUDA for fast processing
- **Model Architecture**: Transformer-based diffusion model
- **Audio Model**: Flow-matching based audio synthesis
- **Processing Time**: ~30-70 seconds depending on settings

### Tips for Best Results
- Use detailed, cinematic descriptions
- Include camera movements and visual style
- Specify lighting, colors, and atmosphere
- Add sound descriptions for better audio matching
- Higher NAG scale = more prompt adherence

---

## ํ•œ๊ธ€ ์„ค๋ช…

### ๊ฐœ์š”
**VEO3 Free**๋Š” Wan2.1-T2V-14B ๋ชจ๋ธ๊ณผ ์ž๋™ ์˜ค๋””์˜ค ์ƒ์„ฑ ๊ธฐ๋Šฅ์„ ๊ฒฐํ•ฉํ•œ ๊ณ ๊ธ‰ AI ๋น„๋””์˜ค ์ƒ์„ฑ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. ํ…์ŠคํŠธ ์„ค๋ช…์œผ๋กœ๋ถ€ํ„ฐ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๊ณ  MMAudio ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•ด ์ž๋™์œผ๋กœ ์ผ์น˜ํ•˜๋Š” ์˜ค๋””์˜ค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

### ์ฃผ์š” ๊ธฐ๋Šฅ

1. **ํ…์ŠคํŠธ-๋น„๋””์˜ค ๋ณ€ํ™˜**
   - Wan2.1-T2V-14B Diffusion ๋ชจ๋ธ ์‚ฌ์šฉ (140์–ต ํŒŒ๋ผ๋ฏธํ„ฐ)
   - NAG(๋…ธ์ด์ฆˆ ์ฆ๊ฐ• ์ƒ์„ฑ)๋ฅผ ํ†ตํ•œ ๋น ๋ฅธ 4๋‹จ๊ณ„ ์ƒ์„ฑ
   - 128x128๋ถ€ํ„ฐ 896x896๊นŒ์ง€ ๋‹ค์–‘ํ•œ ํ•ด์ƒ๋„ ์ง€์›
   - ์ง€์† ์‹œ๊ฐ„: 16 FPS๋กœ 1-8์ดˆ
   - ์ „๋ฌธ์ ์ธ ์นด๋ฉ”๋ผ ์›€์ง์ž„์„ ํฌํ•จํ•œ ์˜ํ™” ํ’ˆ์งˆ ์ถœ๋ ฅ

2. **์ž๋™ ์˜ค๋””์˜ค ์ƒ์„ฑ**
   - ๋™๊ธฐํ™”๋œ ์‚ฌ์šด๋“œ ํšจ๊ณผ๋ฅผ ์œ„ํ•œ MMAudio ํ†ตํ•ฉ
   - ๋น„๋””์˜ค์™€ ์˜ค๋””์˜ค ๋ชจ๋‘ ๋™์ผํ•œ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ ์‚ฌ์šฉ
   - ์˜ค๋””์˜ค ํ’ˆ์งˆ๊ณผ ๊ฐ€์ด๋˜์Šค ๊ฐ•๋„ ์กฐ์ ˆ ๊ฐ€๋Šฅ
   - ์„ ํƒ์  ๊ธฐ๋Šฅ - ํ•„์š”์‹œ ๋น„ํ™œ์„ฑํ™” ๊ฐ€๋Šฅ

3. **๊ณ ๊ธ‰ ์ œ์–ด ๊ธฐ๋Šฅ**
   - **NAG ์Šค์ผ€์ผ**: ๊ฐ€์ด๋˜์Šค ๊ฐ•๋„ ์ œ์–ด (1.0-20.0)
   - **์ถ”๋ก  ๋‹จ๊ณ„**: ํ’ˆ์งˆ ๋Œ€ ์†๋„ ๊ท ํ˜• ์กฐ์ ˆ (1-8๋‹จ๊ณ„)
   - **์‹œ๋“œ ์ œ์–ด**: ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ๊ฒฐ๊ณผ๋ฅผ ์œ„ํ•œ ์„ค์ •
   - **๋„ค๊ฑฐํ‹ฐ๋ธŒ ํ”„๋กฌํ”„ํŠธ**: ์ƒ์„ฑ์—์„œ ํ”ผํ•  ์š”์†Œ ์ง€์ •

### ์ž‘๋™ ๋ฐฉ์‹
1. **์ž…๋ ฅ**: ์ƒ์„ธํ•œ ์žฅ๋ฉด ์„ค๋ช… ์ž…๋ ฅ
2. **๋น„๋””์˜ค ์ƒ์„ฑ**: AI๊ฐ€ ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฐ˜ ๋น„๋””์˜ค ํ”„๋ ˆ์ž„ ์ƒ์„ฑ
3. **์˜ค๋””์˜ค ํ•ฉ์„ฑ**: ์ž๋™์œผ๋กœ ์ผ์น˜ํ•˜๋Š” ์‚ฌ์šด๋“œ ํšจ๊ณผ ์ƒ์„ฑ
4. **์ถœ๋ ฅ**: ๋™๊ธฐํ™”๋œ ์˜ค๋””์˜ค๊ฐ€ ํฌํ•จ๋œ ๋น„๋””์˜ค ์ถœ๋ ฅ

### ํ™œ์šฉ ์‚ฌ๋ก€
- ์˜ํ™” ํ”„๋ฆฌ๋ทฐ ๋ฐ ์ปจ์…‰ ์‹œ๊ฐํ™”
- ๋ฎค์ง ๋น„๋””์˜ค ์ œ์ž‘
- ๊ด‘๊ณ  ์ฝ˜ํ…์ธ  ์ƒ์„ฑ
- ์ฐฝ์˜์  ์Šคํ† ๋ฆฌํ…”๋ง
- ๊ฒŒ์ž„ ์‹œ๋„ค๋งˆํ‹ฑ

### ๊ธฐ์ˆ  ์‚ฌ์–‘
- **GPU ๊ฐ€์†**: ๋น ๋ฅธ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ CUDA ์‚ฌ์šฉ
- **๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜**: ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ํ™•์‚ฐ ๋ชจ๋ธ
- **์˜ค๋””์˜ค ๋ชจ๋ธ**: ํ”Œ๋กœ์šฐ ๋งค์นญ ๊ธฐ๋ฐ˜ ์˜ค๋””์˜ค ํ•ฉ์„ฑ
- **์ฒ˜๋ฆฌ ์‹œ๊ฐ„**: ์„ค์ •์— ๋”ฐ๋ผ ์•ฝ 30-70์ดˆ

### ์ตœ์ƒ์˜ ๊ฒฐ๊ณผ๋ฅผ ์œ„ํ•œ ํŒ
- ์ƒ์„ธํ•˜๊ณ  ์˜ํ™”์ ์ธ ์„ค๋ช… ์‚ฌ์šฉ
- ์นด๋ฉ”๋ผ ์›€์ง์ž„๊ณผ ์‹œ๊ฐ์  ์Šคํƒ€์ผ ํฌํ•จ
- ์กฐ๋ช…, ์ƒ‰์ƒ, ๋ถ„์œ„๊ธฐ ๋ช…์‹œ
- ๋” ๋‚˜์€ ์˜ค๋””์˜ค ๋งค์นญ์„ ์œ„ํ•ด ์‚ฌ์šด๋“œ ์„ค๋ช… ์ถ”๊ฐ€
- ๋†’์€ NAG ์Šค์ผ€์ผ = ํ”„๋กฌํ”„ํŠธ์— ๋” ์ถฉ์‹คํ•œ ์ƒ์„ฑ

### ํŠน๋ณ„ ๊ธฐ๋Šฅ
- **์˜ํ™”๊ธ‰ ํ”„๋กฌํ”„ํŠธ ์˜ˆ์ œ**: ์ „๋ฌธ์ ์ธ ์ดฌ์˜ ๊ธฐ๋ฒ•์ด ํฌํ•จ๋œ 3๊ฐ€์ง€ ์˜ˆ์ œ ์ œ๊ณต
- **์‹ค์‹œ๊ฐ„ ์ง„ํ–‰ ํ‘œ์‹œ**: ์ƒ์„ฑ ๊ณผ์ •์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํ™•์ธ
- **์›ํด๋ฆญ ์˜ˆ์ œ ์ ์šฉ**: ์˜ˆ์ œ๋ฅผ ํด๋ฆญํ•˜๋ฉด ์ž๋™์œผ๋กœ ์„ค์ •๊ฐ’ ์ ์šฉ

์ด ๋„๊ตฌ๋Š” ์ „๋ฌธ๊ฐ€ ์ˆ˜์ค€์˜ ๋น„๋””์˜ค ์ฝ˜ํ…์ธ ๋ฅผ ์‰ฝ๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์œผ๋ฉฐ, ์ฐฝ์˜์ ์ธ ์•„์ด๋””์–ด๋ฅผ ๋น ๋ฅด๊ฒŒ ์‹œ๊ฐํ™”ํ•˜๋Š” ๋ฐ ์ด์ƒ์ ์ž…๋‹ˆ๋‹ค.