Spaces:
Running
on
Zero
Running
on
Zero
File size: 4,765 Bytes
e7af757 48225e6 5029401 ffead1e e7af757 a1ebc37 e7af757 565d5bc e7af757 72a850a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
---
title: Sound AI SFX
emoji: π
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
short_description: SText to Audio(Sound SFX) Generator
---
## TangoFlux: Text-to-Audio Generation System
TangoFlux is a state-of-the-art text-to-audio generation system that converts text descriptions into high-quality audio using advanced AI technology. Built on flow matching and CLAP-ranked preference optimization techniques, it delivers fast and faithful audio synthesis from natural language prompts.
### Key Features
**1. Advanced Audio Generation**
- Converts detailed text descriptions into realistic audio
- Supports complex soundscapes with multiple elements
- Generates audio up to 30 seconds in duration
- Produces 44.1kHz high-quality audio output
**2. Flexible Generation Controls**
- **Steps (10-100)**: Controls generation quality vs speed
- **Guidance Scale (1-10)**: Adjusts how closely the audio follows the prompt
- **Duration (1-30s)**: Sets the length of generated audio
**3. Diverse Audio Capabilities**
- Natural sounds (ocean waves, thunder, rain)
- Animal sounds (dogs barking, cats meowing, birds singing)
- Human sounds (laughter, speaking, whistling, snoring)
- Mechanical sounds (engines, vehicles, machinery)
- Complex soundscapes (multiple layered sounds)
**4. Technical Architecture**
- Uses flow matching for efficient generation
- CLAP-ranked preference optimization for quality
- GPU-accelerated inference with CUDA support
- Transformer-based text encoding
- Optimized for fast generation with @spaces.GPU
### How It Works
1. **Text Input**: Describe the desired audio in natural language
2. **Parameter Adjustment**: Fine-tune generation settings
3. **AI Processing**: The model interprets text and generates corresponding audio
4. **Audio Output**: Download or play the generated WAV file
### Example Use Cases
- **Film & Video Production**: Create custom sound effects and ambiences
- **Game Development**: Generate dynamic environmental sounds
- **Podcast Production**: Add realistic background sounds
- **Music Production**: Create unique sound textures and effects
- **Educational Content**: Generate illustrative audio examples
- **Accessibility**: Convert text descriptions to audio experiences
The system includes 20+ pre-configured examples demonstrating various audio generation capabilities, from simple single sounds to complex multi-layered soundscapes.
---
## TangoFlux: ν
μ€νΈ-ν¬-μ€λμ€ μμ± μμ€ν
TangoFluxλ ν
μ€νΈ μ€λͺ
μ κ³ νμ§ μ€λμ€λ‘ λ³ννλ μ΅μ²¨λ¨ ν
μ€νΈ-ν¬-μ€λμ€ μμ± μμ€ν
μ
λλ€. νλ‘μ° λ§€μΉκ³Ό CLAP μμ κΈ°λ° μ νΈλ μ΅μ ν κΈ°μ μ κΈ°λ°μΌλ‘ ꡬμΆλμ΄, μμ°μ΄ ν둬ννΈλ‘λΆν° λΉ λ₯΄κ³ μ νν μ€λμ€ ν©μ±μ μ 곡ν©λλ€.
### μ£Όμ κΈ°λ₯
**1. κ³ κΈ μ€λμ€ μμ±**
- μμΈν ν
μ€νΈ μ€λͺ
μ νμ€μ μΈ μ€λμ€λ‘ λ³ν
- μ¬λ¬ μμκ° ν¬ν¨λ 볡μ‘ν μ¬μ΄λμ€μΌμ΄ν μ§μ
- μ΅λ 30μ΄ κΈΈμ΄μ μ€λμ€ μμ±
- 44.1kHz κ³ νμ§ μ€λμ€ μΆλ ₯
**2. μ μ°ν μμ± μ μ΄**
- **Steps (10-100)**: μμ± νμ§ λ μλ μ‘°μ
- **Guidance Scale (1-10)**: ν둬ννΈ μ€μλ μ‘°μ
- **Duration (1-30μ΄)**: μμ± μ€λμ€ κΈΈμ΄ μ€μ
**3. λ€μν μ€λμ€ μμ± λ₯λ ₯**
- μμ°μ (νλ, μ²λ₯, λΉ)
- λλ¬Ό μ리 (κ° μ§λ μ리, κ³ μμ΄ μΈμ, μ μ§μ κ·)
- μΈκ° μ리 (μμ, λ§νκΈ°, ννλ, μ½κ³¨μ΄)
- κΈ°κ³μ (μμ§, μ°¨λ, κΈ°κ³λ₯)
- λ³΅ν© μ¬μ΄λμ€μΌμ΄ν (μ¬λ¬ μΈ΅μ μ리 μ‘°ν©)
**4. κΈ°μ μ ꡬ쑰**
- ν¨μ¨μ μΈ μμ±μ μν νλ‘μ° λ§€μΉ μ¬μ©
- νμ§ ν₯μμ μν CLAP μμ κΈ°λ° μ νΈλ μ΅μ ν
- CUDA μ§μ GPU κ°μ μΆλ‘
- νΈλμ€ν¬λ¨Έ κΈ°λ° ν
μ€νΈ μΈμ½λ©
- @spaces.GPUλ‘ λΉ λ₯Έ μμ± μ΅μ ν
### μλ λ°©μ
1. **ν
μ€νΈ μ
λ ₯**: μνλ μ€λμ€λ₯Ό μμ°μ΄λ‘ μ€λͺ
2. **λ§€κ°λ³μ μ‘°μ **: μμ± μ€μ λ―ΈμΈ μ‘°μ
3. **AI μ²λ¦¬**: λͺ¨λΈμ΄ ν
μ€νΈλ₯Ό ν΄μνκ³ ν΄λΉ μ€λμ€ μμ±
4. **μ€λμ€ μΆλ ₯**: μμ±λ WAV νμΌ λ€μ΄λ‘λ λλ μ¬μ
### νμ© μμ
- **μν λ° λΉλμ€ μ μ**: λ§μΆ€ν μ¬μ΄λ ν¨κ³Ό λ° λΆμκΈ°μ μμ±
- **κ²μ κ°λ°**: λμ νκ²½μ μμ±
- **νμΊμ€νΈ μ μ**: νμ€μ μΈ λ°°κ²½μ μΆκ°
- **μμ
μ μ**: λ
νΉν μ¬μ΄λ ν
μ€μ²μ ν¨κ³Ό μμ±
- **κ΅μ‘ μ½ν
μΈ **: μ€λͺ
μ© μ€λμ€ μμ μμ±
- **μ κ·Όμ±**: ν
μ€νΈ μ€λͺ
μ μ€λμ€ κ²½νμΌλ‘ λ³ν
μ΄ μμ€ν
μ λ¨μν λ¨μΌ μ리λΆν° 볡μ‘ν λ€μΈ΅ μ¬μ΄λμ€μΌμ΄νκΉμ§ λ€μν μ€λμ€ μμ± κΈ°λ₯μ 보μ¬μ£Όλ 20κ° μ΄μμ μ¬μ ꡬμ±λ μμ λ₯Ό ν¬ν¨νκ³ μμ΅λλ€. |