Spaces:
Running
on
Zero
A newer version of the Gradio SDK is available:
5.44.1
title: Sound AI SFX
emoji: π
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
short_description: SText to Audio(Sound SFX) Generator
TangoFlux: Text-to-Audio Generation System
TangoFlux is a state-of-the-art text-to-audio generation system that converts text descriptions into high-quality audio using advanced AI technology. Built on flow matching and CLAP-ranked preference optimization techniques, it delivers fast and faithful audio synthesis from natural language prompts.
Key Features
1. Advanced Audio Generation
- Converts detailed text descriptions into realistic audio
- Supports complex soundscapes with multiple elements
- Generates audio up to 30 seconds in duration
- Produces 44.1kHz high-quality audio output
2. Flexible Generation Controls
- Steps (10-100): Controls generation quality vs speed
- Guidance Scale (1-10): Adjusts how closely the audio follows the prompt
- Duration (1-30s): Sets the length of generated audio
3. Diverse Audio Capabilities
- Natural sounds (ocean waves, thunder, rain)
- Animal sounds (dogs barking, cats meowing, birds singing)
- Human sounds (laughter, speaking, whistling, snoring)
- Mechanical sounds (engines, vehicles, machinery)
- Complex soundscapes (multiple layered sounds)
4. Technical Architecture
- Uses flow matching for efficient generation
- CLAP-ranked preference optimization for quality
- GPU-accelerated inference with CUDA support
- Transformer-based text encoding
- Optimized for fast generation with @spaces.GPU
How It Works
- Text Input: Describe the desired audio in natural language
- Parameter Adjustment: Fine-tune generation settings
- AI Processing: The model interprets text and generates corresponding audio
- Audio Output: Download or play the generated WAV file
Example Use Cases
- Film & Video Production: Create custom sound effects and ambiences
- Game Development: Generate dynamic environmental sounds
- Podcast Production: Add realistic background sounds
- Music Production: Create unique sound textures and effects
- Educational Content: Generate illustrative audio examples
- Accessibility: Convert text descriptions to audio experiences
The system includes 20+ pre-configured examples demonstrating various audio generation capabilities, from simple single sounds to complex multi-layered soundscapes.
TangoFlux: ν μ€νΈ-ν¬-μ€λμ€ μμ± μμ€ν
TangoFluxλ ν μ€νΈ μ€λͺ μ κ³ νμ§ μ€λμ€λ‘ λ³ννλ μ΅μ²¨λ¨ ν μ€νΈ-ν¬-μ€λμ€ μμ± μμ€ν μ λλ€. νλ‘μ° λ§€μΉκ³Ό CLAP μμ κΈ°λ° μ νΈλ μ΅μ ν κΈ°μ μ κΈ°λ°μΌλ‘ ꡬμΆλμ΄, μμ°μ΄ ν둬ννΈλ‘λΆν° λΉ λ₯΄κ³ μ νν μ€λμ€ ν©μ±μ μ 곡ν©λλ€.
μ£Όμ κΈ°λ₯
1. κ³ κΈ μ€λμ€ μμ±
- μμΈν ν μ€νΈ μ€λͺ μ νμ€μ μΈ μ€λμ€λ‘ λ³ν
- μ¬λ¬ μμκ° ν¬ν¨λ 볡μ‘ν μ¬μ΄λμ€μΌμ΄ν μ§μ
- μ΅λ 30μ΄ κΈΈμ΄μ μ€λμ€ μμ±
- 44.1kHz κ³ νμ§ μ€λμ€ μΆλ ₯
2. μ μ°ν μμ± μ μ΄
- Steps (10-100): μμ± νμ§ λ μλ μ‘°μ
- Guidance Scale (1-10): ν둬ννΈ μ€μλ μ‘°μ
- Duration (1-30μ΄): μμ± μ€λμ€ κΈΈμ΄ μ€μ
3. λ€μν μ€λμ€ μμ± λ₯λ ₯
- μμ°μ (νλ, μ²λ₯, λΉ)
- λλ¬Ό μ리 (κ° μ§λ μ리, κ³ μμ΄ μΈμ, μ μ§μ κ·)
- μΈκ° μ리 (μμ, λ§νκΈ°, ννλ, μ½κ³¨μ΄)
- κΈ°κ³μ (μμ§, μ°¨λ, κΈ°κ³λ₯)
- λ³΅ν© μ¬μ΄λμ€μΌμ΄ν (μ¬λ¬ μΈ΅μ μ리 μ‘°ν©)
4. κΈ°μ μ ꡬ쑰
- ν¨μ¨μ μΈ μμ±μ μν νλ‘μ° λ§€μΉ μ¬μ©
- νμ§ ν₯μμ μν CLAP μμ κΈ°λ° μ νΈλ μ΅μ ν
- CUDA μ§μ GPU κ°μ μΆλ‘
- νΈλμ€ν¬λ¨Έ κΈ°λ° ν μ€νΈ μΈμ½λ©
- @spaces.GPUλ‘ λΉ λ₯Έ μμ± μ΅μ ν
μλ λ°©μ
- ν μ€νΈ μ λ ₯: μνλ μ€λμ€λ₯Ό μμ°μ΄λ‘ μ€λͺ
- λ§€κ°λ³μ μ‘°μ : μμ± μ€μ λ―ΈμΈ μ‘°μ
- AI μ²λ¦¬: λͺ¨λΈμ΄ ν μ€νΈλ₯Ό ν΄μνκ³ ν΄λΉ μ€λμ€ μμ±
- μ€λμ€ μΆλ ₯: μμ±λ WAV νμΌ λ€μ΄λ‘λ λλ μ¬μ
νμ© μμ
- μν λ° λΉλμ€ μ μ: λ§μΆ€ν μ¬μ΄λ ν¨κ³Ό λ° λΆμκΈ°μ μμ±
- κ²μ κ°λ°: λμ νκ²½μ μμ±
- νμΊμ€νΈ μ μ: νμ€μ μΈ λ°°κ²½μ μΆκ°
- μμ μ μ: λ νΉν μ¬μ΄λ ν μ€μ²μ ν¨κ³Ό μμ±
- κ΅μ‘ μ½ν μΈ : μ€λͺ μ© μ€λμ€ μμ μμ±
- μ κ·Όμ±: ν μ€νΈ μ€λͺ μ μ€λμ€ κ²½νμΌλ‘ λ³ν
μ΄ μμ€ν μ λ¨μν λ¨μΌ μ리λΆν° 볡μ‘ν λ€μΈ΅ μ¬μ΄λμ€μΌμ΄νκΉμ§ λ€μν μ€λμ€ μμ± κΈ°λ₯μ 보μ¬μ£Όλ 20κ° μ΄μμ μ¬μ ꡬμ±λ μμ λ₯Ό ν¬ν¨νκ³ μμ΅λλ€.