SingingSDS / README.md
jhansss's picture
Merge branch 'refactor' into hf
24db250

A newer version of the Gradio SDK is available: 5.39.0

Upgrade
metadata
title: SingingSDS
emoji: 🎢
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 5.4.0
app_file: app.py
pinned: false
python_version: 3.11

SingingSDS: Role-Playing Singing Spoken Dialogue System

A role-playing singing dialogue system that converts speech input into character-based singing output.

Installation

Requirements

  • Python 3.11+
  • CUDA (optional, for GPU acceleration)

Install Dependencies

Option 1: Using Conda (Recommended)

conda create -n singingsds python=3.11

conda activate singingsds
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

Option 2: Using pip only

pip install -r requirements.txt

Option 3: Using pip with virtual environment

python -m venv singingsds_env

# On Windows:
singingsds_env\Scripts\activate
# On macOS/Linux:
source singingsds_env/bin/activate

pip install -r requirements.txt

Usage

Command Line Interface (CLI)

Example Usage

python cli.py --query_audio tests/audio/hello.wav --config_path config/cli/yaoyin_default.yaml --output_audio outputs/yaoyin_hello.wav

Parameter Description

  • --query_audio: Input audio file path (required)
  • --config_path: Configuration file path (default: config/cli/yaoyin_default.yaml)
  • --output_audio: Output audio file path (required)

Web Interface (Gradio)

Start the web interface:

python app.py

Then visit the displayed address in your browser to use the graphical interface.

Configuration

Character Configuration

The system supports multiple preset characters:

  • Yaoyin (ι₯音): Default timbre is timbre2
  • Limei (δΈ½ζ’…): Default timbre is timbre1

Model Configuration

ASR Models

  • openai/whisper-large-v3-turbo
  • openai/whisper-large-v3
  • openai/whisper-medium
  • sanchit-gandhi/whisper-small-dv
  • facebook/wav2vec2-base-960h

LLM Models

  • google/gemma-2-2b
  • MiniMaxAI/MiniMax-M1-80k
  • meta-llama/Llama-3.2-3B-Instruct

SVS Models

  • espnet/mixdata_svs_visinger2_spkemb_lang_pretrained_avg (Bilingual)
  • espnet/aceopencpop_svs_visinger2_40singer_pretrain (Chinese)

Project Structure

SingingSDS/
β”œβ”€β”€ cli.py                 # Command line interface
β”œβ”€β”€ interface.py           # Gradio interface
β”œβ”€β”€ pipeline.py            # Core processing pipeline
β”œβ”€β”€ app.py                 # Web application entry
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ config/                # Configuration files
β”‚   β”œβ”€β”€ cli/               # CLI-specific configuration
β”‚   └── interface/         # Interface-specific configuration
β”œβ”€β”€ modules/               # Core modules
β”‚   β”œβ”€β”€ asr.py            # Speech recognition module
β”‚   β”œβ”€β”€ llm.py            # Large language model module
β”‚   β”œβ”€β”€ melody.py         # Melody control module
β”‚   β”œβ”€β”€ svs/              # Singing voice synthesis modules
β”‚   β”‚   β”œβ”€β”€ base.py       # Base SVS class
β”‚   β”‚   β”œβ”€β”€ espnet.py     # ESPnet SVS implementation
β”‚   β”‚   β”œβ”€β”€ registry.py   # SVS model registry
β”‚   β”‚   └── __init__.py   # SVS module initialization
β”‚   └── utils/            # Utility modules
β”‚       β”œβ”€β”€ g2p.py        # Grapheme-to-phoneme conversion
β”‚       β”œβ”€β”€ text_normalize.py # Text normalization
β”‚       └── resources/    # Utility resources
β”œβ”€β”€ characters/            # Character definitions
β”‚   β”œβ”€β”€ base.py           # Base character class
β”‚   β”œβ”€β”€ Limei.py          # Limei character definition
β”‚   β”œβ”€β”€ Yaoyin.py         # Yaoyin character definition
β”‚   └── __init__.py       # Character module initialization
β”œβ”€β”€ evaluation/            # Evaluation modules
β”‚   └── svs_eval.py       # SVS evaluation metrics
β”œβ”€β”€ data/                  # Data directory
β”‚   β”œβ”€β”€ kising/           # Kising dataset
β”‚   └── touhou/           # Touhou dataset
β”œβ”€β”€ resources/             # Project resources
β”œβ”€β”€ data_handlers/         # Data handling utilities
β”œβ”€β”€ assets/                # Static assets
└── tests/                 # Test files

Contributing

Issues and Pull Requests are welcome!

License