Spaces:
Sleeping
Sleeping
| title: SingingSDS | |
| emoji: πΆ | |
| colorFrom: pink | |
| colorTo: yellow | |
| sdk: gradio | |
| sdk_version: 5.4.0 | |
| app_file: app.py | |
| pinned: false | |
| python_version: 3.11 | |
| # SingingSDS: Role-Playing Singing Spoken Dialogue System | |
| A role-playing singing dialogue system that converts speech input into character-based singing output. | |
| ## Installation | |
| ### Requirements | |
| - Python 3.11+ | |
| - CUDA (optional, for GPU acceleration) | |
| ### Install Dependencies | |
| #### Option 1: Using Conda (Recommended) | |
| ```bash | |
| conda create -n singingsds python=3.11 | |
| conda activate singingsds | |
| conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia | |
| pip install -r requirements.txt | |
| ``` | |
| #### Option 2: Using pip only | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| #### Option 3: Using pip with virtual environment | |
| ```bash | |
| python -m venv singingsds_env | |
| # On Windows: | |
| singingsds_env\Scripts\activate | |
| # On macOS/Linux: | |
| source singingsds_env/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| ## Usage | |
| ### Command Line Interface (CLI) | |
| #### Example Usage | |
| ```bash | |
| python cli.py --query_audio tests/audio/hello.wav --config_path config/cli/yaoyin_default.yaml --output_audio outputs/yaoyin_hello.wav | |
| ``` | |
| #### Parameter Description | |
| - `--query_audio`: Input audio file path (required) | |
| - `--config_path`: Configuration file path (default: config/cli/yaoyin_default.yaml) | |
| - `--output_audio`: Output audio file path (required) | |
| ### Web Interface (Gradio) | |
| Start the web interface: | |
| ```bash | |
| python app.py | |
| ``` | |
| Then visit the displayed address in your browser to use the graphical interface. | |
| ## Configuration | |
| ### Character Configuration | |
| The system supports multiple preset characters: | |
| - **Yaoyin (ι₯ι³)**: Default timbre is `timbre2` | |
| - **Limei (δΈ½ζ’ )**: Default timbre is `timbre1` | |
| ### Model Configuration | |
| #### ASR Models | |
| - `openai/whisper-large-v3-turbo` | |
| - `openai/whisper-large-v3` | |
| - `openai/whisper-medium` | |
| - `sanchit-gandhi/whisper-small-dv` | |
| - `facebook/wav2vec2-base-960h` | |
| #### LLM Models | |
| - `google/gemma-2-2b` | |
| - `MiniMaxAI/MiniMax-M1-80k` | |
| - `meta-llama/Llama-3.2-3B-Instruct` | |
| #### SVS Models | |
| - `espnet/mixdata_svs_visinger2_spkemb_lang_pretrained_avg` (Bilingual) | |
| - `espnet/aceopencpop_svs_visinger2_40singer_pretrain` (Chinese) | |
| ## Project Structure | |
| ``` | |
| SingingSDS/ | |
| βββ cli.py # Command line interface | |
| βββ interface.py # Gradio interface | |
| βββ pipeline.py # Core processing pipeline | |
| βββ app.py # Web application entry | |
| βββ requirements.txt # Python dependencies | |
| βββ config/ # Configuration files | |
| β βββ cli/ # CLI-specific configuration | |
| β βββ interface/ # Interface-specific configuration | |
| βββ modules/ # Core modules | |
| β βββ asr.py # Speech recognition module | |
| β βββ llm.py # Large language model module | |
| β βββ melody.py # Melody control module | |
| β βββ svs/ # Singing voice synthesis modules | |
| β β βββ base.py # Base SVS class | |
| β β βββ espnet.py # ESPnet SVS implementation | |
| β β βββ registry.py # SVS model registry | |
| β β βββ __init__.py # SVS module initialization | |
| β βββ utils/ # Utility modules | |
| β βββ g2p.py # Grapheme-to-phoneme conversion | |
| β βββ text_normalize.py # Text normalization | |
| β βββ resources/ # Utility resources | |
| βββ characters/ # Character definitions | |
| β βββ base.py # Base character class | |
| β βββ Limei.py # Limei character definition | |
| β βββ Yaoyin.py # Yaoyin character definition | |
| β βββ __init__.py # Character module initialization | |
| βββ evaluation/ # Evaluation modules | |
| β βββ svs_eval.py # SVS evaluation metrics | |
| βββ data/ # Data directory | |
| β βββ kising/ # Kising dataset | |
| β βββ touhou/ # Touhou dataset | |
| βββ resources/ # Project resources | |
| βββ data_handlers/ # Data handling utilities | |
| βββ assets/ # Static assets | |
| βββ tests/ # Test files | |
| ``` | |
| ## Contributing | |
| Issues and Pull Requests are welcome! | |
| ## License | |