VEO3-Free / README.md
ginipick's picture
Update README.md
c942c40 verified

A newer version of the Gradio SDK is available: 5.38.2

Upgrade
metadata
title: VEO3 Free
emoji: ๐Ÿ”Š
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
short_description: Wan2.1-T2V-14B + Fast 4-step with NAG + Automatic Audio
models:
  - VIDraft/Gemma-3-R1984-4B
  - google/gemma-3-4b-it
  - Wan-AI/Wan2.1-T2V-14B-Diffusers
  - vrgamedevgirl84/Wan14BT2VFusioniX
  - Kijai/WanVideo_comfy

English Explanation

Overview

This is a VEO3 Free application - an advanced AI video generation system that combines Wan2.1-T2V-14B model with automatic audio generation capabilities. It creates videos from text descriptions and automatically generates matching audio using MMAudio technology.

Key Features

  1. Text-to-Video Generation

    • Uses Wan2.1-T2V-14B Diffusion model (14 billion parameters)
    • Fast 4-step generation with NAG (Noise-Augmented Generation)
    • Supports various resolutions from 128x128 to 896x896
    • Duration: 1-8 seconds at 16 FPS
    • Cinema-quality output with professional camera movements
  2. Automatic Audio Generation

    • MMAudio integration for synchronized sound effects
    • Uses the same text prompt for both video and audio
    • Configurable audio quality and guidance strength
    • Optional feature - can be disabled if needed
  3. Advanced Controls

    • NAG Scale: Controls guidance strength (1.0-20.0)
    • Inference Steps: Balances quality vs speed (1-8 steps)
    • Seed Control: For reproducible results
    • Negative Prompts: Specify what to avoid in generation

How It Works

  1. Input: Enter a detailed scene description
  2. Video Generation: The AI creates video frames based on your prompt
  3. Audio Synthesis: Automatically generates matching sound effects
  4. Output: Combined video with synchronized audio

Example Use Cases

  • Film previews and concept visualization
  • Music video creation
  • Advertising content
  • Creative storytelling
  • Game cinematics

Technical Details

  • GPU Acceleration: Uses CUDA for fast processing
  • Model Architecture: Transformer-based diffusion model
  • Audio Model: Flow-matching based audio synthesis
  • Processing Time: ~30-70 seconds depending on settings

Tips for Best Results

  • Use detailed, cinematic descriptions
  • Include camera movements and visual style
  • Specify lighting, colors, and atmosphere
  • Add sound descriptions for better audio matching
  • Higher NAG scale = more prompt adherence

ํ•œ๊ธ€ ์„ค๋ช…

๊ฐœ์š”

VEO3 Free๋Š” Wan2.1-T2V-14B ๋ชจ๋ธ๊ณผ ์ž๋™ ์˜ค๋””์˜ค ์ƒ์„ฑ ๊ธฐ๋Šฅ์„ ๊ฒฐํ•ฉํ•œ ๊ณ ๊ธ‰ AI ๋น„๋””์˜ค ์ƒ์„ฑ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. ํ…์ŠคํŠธ ์„ค๋ช…์œผ๋กœ๋ถ€ํ„ฐ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๊ณ  MMAudio ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•ด ์ž๋™์œผ๋กœ ์ผ์น˜ํ•˜๋Š” ์˜ค๋””์˜ค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์š” ๊ธฐ๋Šฅ

  1. ํ…์ŠคํŠธ-๋น„๋””์˜ค ๋ณ€ํ™˜

    • Wan2.1-T2V-14B Diffusion ๋ชจ๋ธ ์‚ฌ์šฉ (140์–ต ํŒŒ๋ผ๋ฏธํ„ฐ)
    • NAG(๋…ธ์ด์ฆˆ ์ฆ๊ฐ• ์ƒ์„ฑ)๋ฅผ ํ†ตํ•œ ๋น ๋ฅธ 4๋‹จ๊ณ„ ์ƒ์„ฑ
    • 128x128๋ถ€ํ„ฐ 896x896๊นŒ์ง€ ๋‹ค์–‘ํ•œ ํ•ด์ƒ๋„ ์ง€์›
    • ์ง€์† ์‹œ๊ฐ„: 16 FPS๋กœ 1-8์ดˆ
    • ์ „๋ฌธ์ ์ธ ์นด๋ฉ”๋ผ ์›€์ง์ž„์„ ํฌํ•จํ•œ ์˜ํ™” ํ’ˆ์งˆ ์ถœ๋ ฅ
  2. ์ž๋™ ์˜ค๋””์˜ค ์ƒ์„ฑ

    • ๋™๊ธฐํ™”๋œ ์‚ฌ์šด๋“œ ํšจ๊ณผ๋ฅผ ์œ„ํ•œ MMAudio ํ†ตํ•ฉ
    • ๋น„๋””์˜ค์™€ ์˜ค๋””์˜ค ๋ชจ๋‘ ๋™์ผํ•œ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ ์‚ฌ์šฉ
    • ์˜ค๋””์˜ค ํ’ˆ์งˆ๊ณผ ๊ฐ€์ด๋˜์Šค ๊ฐ•๋„ ์กฐ์ ˆ ๊ฐ€๋Šฅ
    • ์„ ํƒ์  ๊ธฐ๋Šฅ - ํ•„์š”์‹œ ๋น„ํ™œ์„ฑํ™” ๊ฐ€๋Šฅ
  3. ๊ณ ๊ธ‰ ์ œ์–ด ๊ธฐ๋Šฅ

    • NAG ์Šค์ผ€์ผ: ๊ฐ€์ด๋˜์Šค ๊ฐ•๋„ ์ œ์–ด (1.0-20.0)
    • ์ถ”๋ก  ๋‹จ๊ณ„: ํ’ˆ์งˆ ๋Œ€ ์†๋„ ๊ท ํ˜• ์กฐ์ ˆ (1-8๋‹จ๊ณ„)
    • ์‹œ๋“œ ์ œ์–ด: ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ๊ฒฐ๊ณผ๋ฅผ ์œ„ํ•œ ์„ค์ •
    • ๋„ค๊ฑฐํ‹ฐ๋ธŒ ํ”„๋กฌํ”„ํŠธ: ์ƒ์„ฑ์—์„œ ํ”ผํ•  ์š”์†Œ ์ง€์ •

์ž‘๋™ ๋ฐฉ์‹

  1. ์ž…๋ ฅ: ์ƒ์„ธํ•œ ์žฅ๋ฉด ์„ค๋ช… ์ž…๋ ฅ
  2. ๋น„๋””์˜ค ์ƒ์„ฑ: AI๊ฐ€ ํ”„๋กฌํ”„ํŠธ ๊ธฐ๋ฐ˜ ๋น„๋””์˜ค ํ”„๋ ˆ์ž„ ์ƒ์„ฑ
  3. ์˜ค๋””์˜ค ํ•ฉ์„ฑ: ์ž๋™์œผ๋กœ ์ผ์น˜ํ•˜๋Š” ์‚ฌ์šด๋“œ ํšจ๊ณผ ์ƒ์„ฑ
  4. ์ถœ๋ ฅ: ๋™๊ธฐํ™”๋œ ์˜ค๋””์˜ค๊ฐ€ ํฌํ•จ๋œ ๋น„๋””์˜ค ์ถœ๋ ฅ

ํ™œ์šฉ ์‚ฌ๋ก€

  • ์˜ํ™” ํ”„๋ฆฌ๋ทฐ ๋ฐ ์ปจ์…‰ ์‹œ๊ฐํ™”
  • ๋ฎค์ง ๋น„๋””์˜ค ์ œ์ž‘
  • ๊ด‘๊ณ  ์ฝ˜ํ…์ธ  ์ƒ์„ฑ
  • ์ฐฝ์˜์  ์Šคํ† ๋ฆฌํ…”๋ง
  • ๊ฒŒ์ž„ ์‹œ๋„ค๋งˆํ‹ฑ

๊ธฐ์ˆ  ์‚ฌ์–‘

  • GPU ๊ฐ€์†: ๋น ๋ฅธ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ CUDA ์‚ฌ์šฉ
  • ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜: ํŠธ๋žœ์Šคํฌ๋จธ ๊ธฐ๋ฐ˜ ํ™•์‚ฐ ๋ชจ๋ธ
  • ์˜ค๋””์˜ค ๋ชจ๋ธ: ํ”Œ๋กœ์šฐ ๋งค์นญ ๊ธฐ๋ฐ˜ ์˜ค๋””์˜ค ํ•ฉ์„ฑ
  • ์ฒ˜๋ฆฌ ์‹œ๊ฐ„: ์„ค์ •์— ๋”ฐ๋ผ ์•ฝ 30-70์ดˆ

์ตœ์ƒ์˜ ๊ฒฐ๊ณผ๋ฅผ ์œ„ํ•œ ํŒ

  • ์ƒ์„ธํ•˜๊ณ  ์˜ํ™”์ ์ธ ์„ค๋ช… ์‚ฌ์šฉ
  • ์นด๋ฉ”๋ผ ์›€์ง์ž„๊ณผ ์‹œ๊ฐ์  ์Šคํƒ€์ผ ํฌํ•จ
  • ์กฐ๋ช…, ์ƒ‰์ƒ, ๋ถ„์œ„๊ธฐ ๋ช…์‹œ
  • ๋” ๋‚˜์€ ์˜ค๋””์˜ค ๋งค์นญ์„ ์œ„ํ•ด ์‚ฌ์šด๋“œ ์„ค๋ช… ์ถ”๊ฐ€
  • ๋†’์€ NAG ์Šค์ผ€์ผ = ํ”„๋กฌํ”„ํŠธ์— ๋” ์ถฉ์‹คํ•œ ์ƒ์„ฑ

ํŠน๋ณ„ ๊ธฐ๋Šฅ

  • ์˜ํ™”๊ธ‰ ํ”„๋กฌํ”„ํŠธ ์˜ˆ์ œ: ์ „๋ฌธ์ ์ธ ์ดฌ์˜ ๊ธฐ๋ฒ•์ด ํฌํ•จ๋œ 3๊ฐ€์ง€ ์˜ˆ์ œ ์ œ๊ณต
  • ์‹ค์‹œ๊ฐ„ ์ง„ํ–‰ ํ‘œ์‹œ: ์ƒ์„ฑ ๊ณผ์ •์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํ™•์ธ
  • ์›ํด๋ฆญ ์˜ˆ์ œ ์ ์šฉ: ์˜ˆ์ œ๋ฅผ ํด๋ฆญํ•˜๋ฉด ์ž๋™์œผ๋กœ ์„ค์ •๊ฐ’ ์ ์šฉ

์ด ๋„๊ตฌ๋Š” ์ „๋ฌธ๊ฐ€ ์ˆ˜์ค€์˜ ๋น„๋””์˜ค ์ฝ˜ํ…์ธ ๋ฅผ ์‰ฝ๊ฒŒ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์œผ๋ฉฐ, ์ฐฝ์˜์ ์ธ ์•„์ด๋””์–ด๋ฅผ ๋น ๋ฅด๊ฒŒ ์‹œ๊ฐํ™”ํ•˜๋Š” ๋ฐ ์ด์ƒ์ ์ž…๋‹ˆ๋‹ค.