| # ReVoice-2025 β Speech Enhancement Hackathon (Baseline) |
|
|
| This repository represents a **baseline** (basic solution) for participating in the **ReVoice-2025** hackathon. The project is based on the **Miipher** model and adapted for the competition. We tried to make the code as clean, fast, and convenient as possible. |
|
|
| ## π Quick Start |
|
|
| ### 1. Environment Setup |
|
|
| Python 3.10.11 is recommended. |
|
|
| ```bash |
| git clone https://github.com/mtuciru/ReVoice-2025 |
| cd ReVoice-2025 |
| |
| python3 -m venv venv |
| source venv/bin/activate |
| |
| pip install -r requirements.txt |
| pip install --no-dependencies git+https://github.com/Wataru-Nakata/ssl-vocoders.git |
| |
| export PYTHONPATH=./src |
| ``` |
|
|
| ### 2. Downloading Pre-trained Weights |
|
|
| The script will automatically download Miipher and HiFiGAN weights to the `./models` folder. |
|
|
| ```bash |
| python3 scripts/download_weights.py |
| ``` |
|
|
| ### 3. Dataset Preparation |
|
|
| Training the model requires a prepared dataset (clean + noisy audio + phonemes). |
| The script takes your folder with clean audio, adds noise (using the degrader config), and generates phonemes (using GigaAM for transcription if no text is present). |
|
|
| **Important**: Before running, edit `examples/configs/degrader_config.yaml`, specifying the path to your noise files (`noise_dir` parameter etc., if used). |
|
|
| ```bash |
| python3 scripts/prepare_dataset.py \ |
| --input_dir /path/to/clean_audio \ |
| --output_dir /path/to/processed_dataset \ |
| --degrader_config examples/configs/degrader_config.yaml |
| ``` |
|
|
| ### 4. Training Configuration |
|
|
| All training settings are located in `examples/configs/config.yaml`. |
| Main parameters to check: |
| * `data.train_dataset_path`: Path to the folder you created in step 3. |
| * `data.val_dataset_path`: Path to the validation set. |
| * `train.trainer.devices`: Number and IDs of GPUs (default `1`). |
|
|
| ### 5. Starting Training |
|
|
| ```bash |
| python3 examples/train.py |
| ``` |
|
|
| ### 6. Monitoring (TensorBoard) |
|
|
| Monitor training progress and metrics: |
|
|
| ```bash |
| tensorboard --logdir logs/ |
| ``` |
|
|
| ### 7. Inference (Speech Restoration) |
|
|
| To restore speech from noisy files, use the `run_miipher.py` script. It takes a folder with input files and a folder to save the result. |
|
|
| ```bash |
| python3 scripts/run_miipher.py \ |
| --input_dir /path/to/noisy_audio \ |
| --output_dir /path/to/restored_audio \ |
| --lang_code rus \ |
| --miipher_ckpt ./models/miipher.ckpt \ |
| --vocoder_ckpt ./models/hifigan.ckpt |
| ``` |
|
|
| Arguments: |
| * `--input_dir`: Folder with noisy files (`.wav`, `.mp3`, `.flac`). |
| * `--output_dir`: Folder where restored files will be saved. |
| * `--lang_code`: Language code for phonetization (default `rus`). If text transcripts (`.txt`) exist, the script will try to find them. Otherwise, ASR (GigaAM) will be used. |
|
|
| ### 8. Quality Evaluation (Metrics) |
|
|
| To calculate metrics (SI-SNR, STOI, MelLoss), use `eval.py`. The script compares the folder with restored files (hypotheses) and the folder with clean reference files (references). |
|
|
| ```bash |
| python3 eval.py \ |
| --hyp_dir /path/to/restored_audio \ |
| --ref_dir /path/to/clean_reference_audio \ |
| --output_csv metrics_results.csv |
| ``` |
|
|
| Arguments: |
| * `--hyp_dir`: Folder with your restored files. |
| * `--ref_dir`: Folder with clean original files (files must have matching names). |
| * `--output_csv`: Path to save the results table (default `metrics_results.csv`). |
|
|
| --- |
|
|
| ## π Project Structure |
|
|
| * `examples/train.py` β Main script for starting training. |
| * `examples/configs/config.yaml` β Configuration for hyperparameters, paths, and the model. |
| * `run_miipher.py` β Script for running inference on a folder. |
| * `eval.py` β Script for calculating metrics on a folder. |
| * `scripts/prepare_dataset.py` β Script for dataset generation (augmentation + phonemization). |
| * `scripts/download_weights.py` β Weight downloader. |
| * `src/miipher/lightning_module.py` β Training logic (Pytorch Lightning), training step, validation, metrics. |
| * `src/miipher/dataset` β Data loading logic (Dataset, DataModule). |
| * `src/miipher/metrics/eval_metrics.py` β Implementation of SI-SNR, STOI, MelLoss metrics. |
|
|
|
|