Spaces:
Running
Running
title: SingingSDS | |
emoji: πΆ | |
colorFrom: pink | |
colorTo: yellow | |
sdk: gradio | |
sdk_version: 5.4.0 | |
app_file: app.py | |
pinned: false | |
python_version: 3.11 | |
# SingingSDS: Role-Playing Singing Spoken Dialogue System | |
A role-playing singing dialogue system that converts speech input into character-based singing output. | |
## Installation | |
### Requirements | |
- Python 3.11+ | |
- CUDA (optional, for GPU acceleration) | |
### Install Dependencies | |
#### Option 1: Using Conda (Recommended) | |
```bash | |
conda create -n singingsds python=3.11 | |
conda activate singingsds | |
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia | |
pip install -r requirements.txt | |
``` | |
#### Option 2: Using pip only | |
```bash | |
pip install -r requirements.txt | |
``` | |
#### Option 3: Using pip with virtual environment | |
```bash | |
python -m venv singingsds_env | |
# On Windows: | |
singingsds_env\Scripts\activate | |
# On macOS/Linux: | |
source singingsds_env/bin/activate | |
pip install -r requirements.txt | |
``` | |
## Usage | |
### Command Line Interface (CLI) | |
#### Example Usage | |
```bash | |
python cli.py --query_audio tests/audio/hello.wav --config_path config/cli/yaoyin_default.yaml --output_audio outputs/yaoyin_hello.wav | |
``` | |
#### Parameter Description | |
- `--query_audio`: Input audio file path (required) | |
- `--config_path`: Configuration file path (default: config/cli/yaoyin_default.yaml) | |
- `--output_audio`: Output audio file path (required) | |
### Web Interface (Gradio) | |
Start the web interface: | |
```bash | |
python app.py | |
``` | |
Then visit the displayed address in your browser to use the graphical interface. | |
## Configuration | |
### Character Configuration | |
The system supports multiple preset characters: | |
- **Yaoyin (ι₯ι³)**: Default timbre is `timbre2` | |
- **Limei (δΈ½ζ’ )**: Default timbre is `timbre1` | |
### Model Configuration | |
#### ASR Models | |
- `openai/whisper-large-v3-turbo` | |
- `openai/whisper-large-v3` | |
- `openai/whisper-medium` | |
- `sanchit-gandhi/whisper-small-dv` | |
- `facebook/wav2vec2-base-960h` | |
#### LLM Models | |
- `google/gemma-2-2b` | |
- `MiniMaxAI/MiniMax-M1-80k` | |
- `meta-llama/Llama-3.2-3B-Instruct` | |
#### SVS Models | |
- `espnet/mixdata_svs_visinger2_spkemb_lang_pretrained_avg` (Bilingual) | |
- `espnet/aceopencpop_svs_visinger2_40singer_pretrain` (Chinese) | |
## Project Structure | |
``` | |
SingingSDS/ | |
βββ cli.py # Command line interface | |
βββ interface.py # Gradio interface | |
βββ pipeline.py # Core processing pipeline | |
βββ app.py # Web application entry | |
βββ requirements.txt # Python dependencies | |
βββ config/ # Configuration files | |
β βββ cli/ # CLI-specific configuration | |
β βββ interface/ # Interface-specific configuration | |
βββ modules/ # Core modules | |
β βββ asr.py # Speech recognition module | |
β βββ llm.py # Large language model module | |
β βββ melody.py # Melody control module | |
β βββ svs/ # Singing voice synthesis modules | |
β β βββ base.py # Base SVS class | |
β β βββ espnet.py # ESPnet SVS implementation | |
β β βββ registry.py # SVS model registry | |
β β βββ __init__.py # SVS module initialization | |
β βββ utils/ # Utility modules | |
β βββ g2p.py # Grapheme-to-phoneme conversion | |
β βββ text_normalize.py # Text normalization | |
β βββ resources/ # Utility resources | |
βββ characters/ # Character definitions | |
β βββ base.py # Base character class | |
β βββ Limei.py # Limei character definition | |
β βββ Yaoyin.py # Yaoyin character definition | |
β βββ __init__.py # Character module initialization | |
βββ evaluation/ # Evaluation modules | |
β βββ svs_eval.py # SVS evaluation metrics | |
βββ data/ # Data directory | |
β βββ kising/ # Kising dataset | |
β βββ touhou/ # Touhou dataset | |
βββ resources/ # Project resources | |
βββ data_handlers/ # Data handling utilities | |
βββ assets/ # Static assets | |
βββ tests/ # Test files | |
``` | |
## Contributing | |
Issues and Pull Requests are welcome! | |
## License | |