Spaces:
Running
Running
File size: 4,358 Bytes
bde09ef 79ad7df 7a23964 79ad7df 3166c53 79ad7df |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
---
title: SingingSDS
emoji: πΆ
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 5.4.0
app_file: app.py
pinned: false
python_version: 3.11
---
# SingingSDS: Role-Playing Singing Spoken Dialogue System
A role-playing singing dialogue system that converts speech input into character-based singing output.
## Installation
### Requirements
- Python 3.11+
- CUDA (optional, for GPU acceleration)
### Install Dependencies
#### Option 1: Using Conda (Recommended)
```bash
conda create -n singingsds python=3.11
conda activate singingsds
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
```
#### Option 2: Using pip only
```bash
pip install -r requirements.txt
```
#### Option 3: Using pip with virtual environment
```bash
python -m venv singingsds_env
# On Windows:
singingsds_env\Scripts\activate
# On macOS/Linux:
source singingsds_env/bin/activate
pip install -r requirements.txt
```
## Usage
### Command Line Interface (CLI)
#### Example Usage
```bash
python cli.py --query_audio tests/audio/hello.wav --config_path config/cli/yaoyin_default.yaml --output_audio outputs/yaoyin_hello.wav
```
#### Parameter Description
- `--query_audio`: Input audio file path (required)
- `--config_path`: Configuration file path (default: config/cli/yaoyin_default.yaml)
- `--output_audio`: Output audio file path (required)
### Web Interface (Gradio)
Start the web interface:
```bash
python app.py
```
Then visit the displayed address in your browser to use the graphical interface.
## Configuration
### Character Configuration
The system supports multiple preset characters:
- **Yaoyin (ι₯ι³)**: Default timbre is `timbre2`
- **Limei (δΈ½ζ’
)**: Default timbre is `timbre1`
### Model Configuration
#### ASR Models
- `openai/whisper-large-v3-turbo`
- `openai/whisper-large-v3`
- `openai/whisper-medium`
- `sanchit-gandhi/whisper-small-dv`
- `facebook/wav2vec2-base-960h`
#### LLM Models
- `google/gemma-2-2b`
- `MiniMaxAI/MiniMax-M1-80k`
- `meta-llama/Llama-3.2-3B-Instruct`
#### SVS Models
- `espnet/mixdata_svs_visinger2_spkemb_lang_pretrained_avg` (Bilingual)
- `espnet/aceopencpop_svs_visinger2_40singer_pretrain` (Chinese)
## Project Structure
```
SingingSDS/
βββ cli.py # Command line interface
βββ interface.py # Gradio interface
βββ pipeline.py # Core processing pipeline
βββ app.py # Web application entry
βββ requirements.txt # Python dependencies
βββ config/ # Configuration files
β βββ cli/ # CLI-specific configuration
β βββ interface/ # Interface-specific configuration
βββ modules/ # Core modules
β βββ asr.py # Speech recognition module
β βββ llm.py # Large language model module
β βββ melody.py # Melody control module
β βββ svs/ # Singing voice synthesis modules
β β βββ base.py # Base SVS class
β β βββ espnet.py # ESPnet SVS implementation
β β βββ registry.py # SVS model registry
β β βββ __init__.py # SVS module initialization
β βββ utils/ # Utility modules
β βββ g2p.py # Grapheme-to-phoneme conversion
β βββ text_normalize.py # Text normalization
β βββ resources/ # Utility resources
βββ characters/ # Character definitions
β βββ base.py # Base character class
β βββ Limei.py # Limei character definition
β βββ Yaoyin.py # Yaoyin character definition
β βββ __init__.py # Character module initialization
βββ evaluation/ # Evaluation modules
β βββ svs_eval.py # SVS evaluation metrics
βββ data/ # Data directory
β βββ kising/ # Kising dataset
β βββ touhou/ # Touhou dataset
βββ resources/ # Project resources
βββ data_handlers/ # Data handling utilities
βββ assets/ # Static assets
βββ tests/ # Test files
```
## Contributing
Issues and Pull Requests are welcome!
## License
|