Upload README.md

42430bd verified 5 months ago

4.06 kB

	# ReVoice-2025 — Speech Enhancement Hackathon (Baseline)

	This repository represents a baseline (basic solution) for participating in the ReVoice-2025 hackathon. The project is based on the Miipher model and adapted for the competition. We tried to make the code as clean, fast, and convenient as possible.

	## 🚀 Quick Start

	### 1. Environment Setup

	Python 3.10.11 is recommended.

	```bash
	git clone https://github.com/mtuciru/ReVoice-2025
	cd ReVoice-2025

	python3 -m venv venv
	source venv/bin/activate

	pip install -r requirements.txt
	pip install --no-dependencies git+https://github.com/Wataru-Nakata/ssl-vocoders.git

	export PYTHONPATH=./src
	```

	### 2. Downloading Pre-trained Weights

	The script will automatically download Miipher and HiFiGAN weights to the `./models` folder.

	```bash
	python3 scripts/download_weights.py
	```

	### 3. Dataset Preparation

	Training the model requires a prepared dataset (clean + noisy audio + phonemes).
	The script takes your folder with clean audio, adds noise (using the degrader config), and generates phonemes (using GigaAM for transcription if no text is present).

	Important: Before running, edit `examples/configs/degrader_config.yaml`, specifying the path to your noise files (`noise_dir` parameter etc., if used).

	```bash
	python3 scripts/prepare_dataset.py \
	--input_dir /path/to/clean_audio \
	--output_dir /path/to/processed_dataset \
	--degrader_config examples/configs/degrader_config.yaml
	```

	### 4. Training Configuration

	All training settings are located in `examples/configs/config.yaml`.
	Main parameters to check:
	* `data.train_dataset_path`: Path to the folder you created in step 3.
	* `data.val_dataset_path`: Path to the validation set.
	* `train.trainer.devices`: Number and IDs of GPUs (default `1`).

	### 5. Starting Training

	```bash
	python3 examples/train.py
	```

	### 6. Monitoring (TensorBoard)

	Monitor training progress and metrics:

	```bash
	tensorboard --logdir logs/
	```

	### 7. Inference (Speech Restoration)

	To restore speech from noisy files, use the `run_miipher.py` script. It takes a folder with input files and a folder to save the result.

	```bash
	python3 scripts/run_miipher.py \
	--input_dir /path/to/noisy_audio \
	--output_dir /path/to/restored_audio \
	--lang_code rus \
	--miipher_ckpt ./models/miipher.ckpt \
	--vocoder_ckpt ./models/hifigan.ckpt
	```

	Arguments:
	* `--input_dir`: Folder with noisy files (`.wav`, `.mp3`, `.flac`).
	* `--output_dir`: Folder where restored files will be saved.
	* `--lang_code`: Language code for phonetization (default `rus`). If text transcripts (`.txt`) exist, the script will try to find them. Otherwise, ASR (GigaAM) will be used.

	### 8. Quality Evaluation (Metrics)

	To calculate metrics (SI-SNR, STOI, MelLoss), use `eval.py`. The script compares the folder with restored files (hypotheses) and the folder with clean reference files (references).

	```bash
	python3 eval.py \
	--hyp_dir /path/to/restored_audio \
	--ref_dir /path/to/clean_reference_audio \
	--output_csv metrics_results.csv
	```

	Arguments:
	* `--hyp_dir`: Folder with your restored files.
	* `--ref_dir`: Folder with clean original files (files must have matching names).
	* `--output_csv`: Path to save the results table (default `metrics_results.csv`).

	---

	## 📂 Project Structure

	* `examples/train.py` — Main script for starting training.
	* `examples/configs/config.yaml` — Configuration for hyperparameters, paths, and the model.
	* `run_miipher.py` — Script for running inference on a folder.
	* `eval.py` — Script for calculating metrics on a folder.
	* `scripts/prepare_dataset.py` — Script for dataset generation (augmentation + phonemization).
	* `scripts/download_weights.py` — Weight downloader.
	* `src/miipher/lightning_module.py` — Training logic (Pytorch Lightning), training step, validation, metrics.
	* `src/miipher/dataset` — Data loading logic (Dataset, DataModule).
	* `src/miipher/metrics/eval_metrics.py` — Implementation of SI-SNR, STOI, MelLoss metrics.