The-Imitation-Game / README.md
fantaxy's picture
Update README.md
72a850a verified

A newer version of the Gradio SDK is available: 5.44.1

Upgrade
metadata
title: Sound AI SFX
emoji: 🐠
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 5.35.0
app_file: app.py
pinned: false
short_description: SText to Audio(Sound SFX) Generator

TangoFlux: Text-to-Audio Generation System

TangoFlux is a state-of-the-art text-to-audio generation system that converts text descriptions into high-quality audio using advanced AI technology. Built on flow matching and CLAP-ranked preference optimization techniques, it delivers fast and faithful audio synthesis from natural language prompts.

Key Features

1. Advanced Audio Generation

  • Converts detailed text descriptions into realistic audio
  • Supports complex soundscapes with multiple elements
  • Generates audio up to 30 seconds in duration
  • Produces 44.1kHz high-quality audio output

2. Flexible Generation Controls

  • Steps (10-100): Controls generation quality vs speed
  • Guidance Scale (1-10): Adjusts how closely the audio follows the prompt
  • Duration (1-30s): Sets the length of generated audio

3. Diverse Audio Capabilities

  • Natural sounds (ocean waves, thunder, rain)
  • Animal sounds (dogs barking, cats meowing, birds singing)
  • Human sounds (laughter, speaking, whistling, snoring)
  • Mechanical sounds (engines, vehicles, machinery)
  • Complex soundscapes (multiple layered sounds)

4. Technical Architecture

  • Uses flow matching for efficient generation
  • CLAP-ranked preference optimization for quality
  • GPU-accelerated inference with CUDA support
  • Transformer-based text encoding
  • Optimized for fast generation with @spaces.GPU

How It Works

  1. Text Input: Describe the desired audio in natural language
  2. Parameter Adjustment: Fine-tune generation settings
  3. AI Processing: The model interprets text and generates corresponding audio
  4. Audio Output: Download or play the generated WAV file

Example Use Cases

  • Film & Video Production: Create custom sound effects and ambiences
  • Game Development: Generate dynamic environmental sounds
  • Podcast Production: Add realistic background sounds
  • Music Production: Create unique sound textures and effects
  • Educational Content: Generate illustrative audio examples
  • Accessibility: Convert text descriptions to audio experiences

The system includes 20+ pre-configured examples demonstrating various audio generation capabilities, from simple single sounds to complex multi-layered soundscapes.


TangoFlux: ν…μŠ€νŠΈ-투-μ˜€λ””μ˜€ 생성 μ‹œμŠ€ν…œ

TangoFluxλŠ” ν…μŠ€νŠΈ μ„€λͺ…을 κ³ ν’ˆμ§ˆ μ˜€λ””μ˜€λ‘œ λ³€ν™˜ν•˜λŠ” μ΅œμ²¨λ‹¨ ν…μŠ€νŠΈ-투-μ˜€λ””μ˜€ 생성 μ‹œμŠ€ν…œμž…λ‹ˆλ‹€. ν”Œλ‘œμš° λ§€μΉ­κ³Ό CLAP μˆœμœ„ 기반 μ„ ν˜Έλ„ μ΅œμ ν™” κΈ°μˆ μ„ 기반으둜 κ΅¬μΆ•λ˜μ–΄, μžμ—°μ–΄ ν”„λ‘¬ν”„νŠΈλ‘œλΆ€ν„° λΉ λ₯΄κ³  μ •ν™•ν•œ μ˜€λ””μ˜€ 합성을 μ œκ³΅ν•©λ‹ˆλ‹€.

μ£Όμš” κΈ°λŠ₯

1. κ³ κΈ‰ μ˜€λ””μ˜€ 생성

  • μƒμ„Έν•œ ν…μŠ€νŠΈ μ„€λͺ…을 ν˜„μ‹€μ μΈ μ˜€λ””μ˜€λ‘œ λ³€ν™˜
  • μ—¬λŸ¬ μš”μ†Œκ°€ ν¬ν•¨λœ λ³΅μž‘ν•œ μ‚¬μš΄λ“œμŠ€μΌ€μ΄ν”„ 지원
  • μ΅œλŒ€ 30초 길이의 μ˜€λ””μ˜€ 생성
  • 44.1kHz κ³ ν’ˆμ§ˆ μ˜€λ””μ˜€ 좜λ ₯

2. μœ μ—°ν•œ 생성 μ œμ–΄

  • Steps (10-100): 생성 ν’ˆμ§ˆ λŒ€ 속도 쑰절
  • Guidance Scale (1-10): ν”„λ‘¬ν”„νŠΈ μ€€μˆ˜λ„ μ‘°μ •
  • Duration (1-30초): 생성 μ˜€λ””μ˜€ 길이 μ„€μ •

3. λ‹€μ–‘ν•œ μ˜€λ””μ˜€ 생성 λŠ₯λ ₯

  • μžμ—°μŒ (νŒŒλ„, 천λ‘₯, λΉ„)
  • 동물 μ†Œλ¦¬ (개 μ§–λŠ” μ†Œλ¦¬, 고양이 울음, μƒˆ 지저귐)
  • 인간 μ†Œλ¦¬ (μ›ƒμŒ, λ§ν•˜κΈ°, 휘파람, 코골이)
  • κΈ°κ³„μŒ (μ—”μ§„, μ°¨λŸ‰, 기계λ₯˜)
  • 볡합 μ‚¬μš΄λ“œμŠ€μΌ€μ΄ν”„ (μ—¬λŸ¬ 측의 μ†Œλ¦¬ μ‘°ν•©)

4. 기술적 ꡬ쑰

  • 효율적인 생성을 μœ„ν•œ ν”Œλ‘œμš° λ§€μΉ­ μ‚¬μš©
  • ν’ˆμ§ˆ ν–₯상을 μœ„ν•œ CLAP μˆœμœ„ 기반 μ„ ν˜Έλ„ μ΅œμ ν™”
  • CUDA 지원 GPU 가속 μΆ”λ‘ 
  • 트랜슀포머 기반 ν…μŠ€νŠΈ 인코딩
  • @spaces.GPU둜 λΉ λ₯Έ 생성 μ΅œμ ν™”

μž‘λ™ 방식

  1. ν…μŠ€νŠΈ μž…λ ₯: μ›ν•˜λŠ” μ˜€λ””μ˜€λ₯Ό μžμ—°μ–΄λ‘œ μ„€λͺ…
  2. λ§€κ°œλ³€μˆ˜ μ‘°μ •: 생성 μ„€μ • λ―Έμ„Έ μ‘°μ •
  3. AI 처리: λͺ¨λΈμ΄ ν…μŠ€νŠΈλ₯Ό ν•΄μ„ν•˜κ³  ν•΄λ‹Ή μ˜€λ””μ˜€ 생성
  4. μ˜€λ””μ˜€ 좜λ ₯: μƒμ„±λœ WAV 파일 λ‹€μš΄λ‘œλ“œ λ˜λŠ” μž¬μƒ

ν™œμš© μ˜ˆμ‹œ

  • μ˜ν™” 및 λΉ„λ””μ˜€ μ œμž‘: λ§žμΆ€ν˜• μ‚¬μš΄λ“œ 효과 및 λΆ„μœ„κΈ°μŒ 생성
  • κ²Œμž„ 개발: 동적 ν™˜κ²½μŒ 생성
  • 팟캐슀트 μ œμž‘: ν˜„μ‹€μ μΈ 배경음 μΆ”κ°€
  • μŒμ•… μ œμž‘: λ…νŠΉν•œ μ‚¬μš΄λ“œ ν…μŠ€μ²˜μ™€ 효과 생성
  • ꡐ윑 μ½˜ν…μΈ : μ„€λͺ…μš© μ˜€λ””μ˜€ 예제 생성
  • μ ‘κ·Όμ„±: ν…μŠ€νŠΈ μ„€λͺ…을 μ˜€λ””μ˜€ κ²½ν—˜μœΌλ‘œ λ³€ν™˜

이 μ‹œμŠ€ν…œμ€ λ‹¨μˆœν•œ 단일 μ†Œλ¦¬λΆ€ν„° λ³΅μž‘ν•œ λ‹€μΈ΅ μ‚¬μš΄λ“œμŠ€μΌ€μ΄ν”„κΉŒμ§€ λ‹€μ–‘ν•œ μ˜€λ””μ˜€ 생성 κΈ°λŠ₯을 λ³΄μ—¬μ£ΌλŠ” 20개 μ΄μƒμ˜ 사전 κ΅¬μ„±λœ 예제λ₯Ό ν¬ν•¨ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.