Spaces:
Sleeping
Sleeping
File size: 3,841 Bytes
bde09ef 79ad7df 92276c4 79ad7df 05006b2 79ad7df 05006b2 79ad7df 05006b2 79ad7df 3166c53 79ad7df 48162db 79ad7df |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
---
title: SingingSDS
emoji: πΆ
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 5.4.0
app_file: app.py
pinned: false
---
# SingingSDS: Role-Playing Singing Spoken Dialogue System
A role-playing singing dialogue system that converts speech input into character-based singing output.
## Installation
### Requirements
- Python 3.11+
- CUDA (optional, for GPU acceleration)
### Install Dependencies
#### Option 1: Using Conda (Recommended)
```bash
conda create -n singingsds python=3.11
conda activate singingsds
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
```
#### Option 2: Using pip only
```bash
pip install -r requirements.txt
```
#### Option 3: Using pip with virtual environment
```bash
python -m venv singingsds_env
# On Windows:
singingsds_env\Scripts\activate
# On macOS/Linux:
source singingsds_env/bin/activate
pip install -r requirements.txt
```
## Usage
### Command Line Interface (CLI)
#### Example Usage
```bash
python cli.py \
--query_audio tests/audio/hello.wav \
--config_path config/cli/yaoyin_default.yaml \
--output_audio outputs/yaoyin_hello.wav \
--eval_results_csv outputs/yaoyin_test.csv
```
#### Inference-Only Mode
Run minimal inference without evaluation.
```bash
python cli.py \
--query_audio tests/audio/hello.wav \
--config_path config/cli/yaoyin_default_infer_only.yaml \
--output_audio outputs/yaoyin_hello.wav
```
#### Parameter Description
- `--query_audio`: Input audio file path (required)
- `--config_path`: Configuration file path (default: config/cli/yaoyin_default.yaml)
- `--output_audio`: Output audio file path (required)
### Web Interface (Gradio)
Start the web interface:
```bash
python app.py
```
Then visit the displayed address in your browser to use the graphical interface.
## Configuration
### Character Configuration
The system supports multiple preset characters:
- **Yaoyin (ι₯ι³)**: Default timbre is `timbre2`
- **Limei (δΈ½ζ’
)**: Default timbre is `timbre1`
### Model Configuration
#### ASR Models
- `openai/whisper-large-v3-turbo`
- `openai/whisper-large-v3`
- `openai/whisper-medium`
- `openai/whisper-small`
- `funasr/paraformer-zh`
#### LLM Models
- `gemini-2.5-flash`
- `google/gemma-2-2b`
- `meta-llama/Llama-3.2-3B-Instruct`
- `meta-llama/Llama-3.1-8B-Instruct`
- `Qwen/Qwen3-8B`
- `Qwen/Qwen3-30B-A3B`
- `MiniMaxAI/MiniMax-Text-01`
#### SVS Models
- `espnet/mixdata_svs_visinger2_spkemb_lang_pretrained_avg` (Bilingual)
- `espnet/aceopencpop_svs_visinger2_40singer_pretrain` (Chinese)
## Project Structure
```
SingingSDS/
βββ app.py, cli.py # Entry points (demo app & CLI)
βββ pipeline.py # Main orchestration pipeline
βββ interface.py # Gradio interface
βββ characters/ # Virtual character definitions
βββ modules/ # Core modules
β βββ asr/ # ASR models (Whisper, Paraformer)
β βββ llm/ # LLMs (Gemini, LLaMA, etc.)
β βββ svs/ # Singing voice synthesis (ESPnet)
β βββ utils/ # G2P, text normalization, resources
βββ config/ # YAML configuration files
βββ data/ # Dataset metadata and length info
βββ data_handlers/ # Parsers for KiSing, Touhou, etc.
βββ evaluation/ # Evaluation metrics
βββ resources/ # Singer embeddings, phoneme dicts, MIDI
βββ assets/ # Character visuals
βββ tests/ # Unit tests and sample audios
βββ README.md, requirements.txt
```
## Contributing
Issues and Pull Requests are welcome!
## License
|