File size: 4,055 Bytes
42430bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
# ReVoice-2025 β€” Speech Enhancement Hackathon (Baseline)

This repository represents a **baseline** (basic solution) for participating in the **ReVoice-2025** hackathon. The project is based on the **Miipher** model and adapted for the competition. We tried to make the code as clean, fast, and convenient as possible.

## πŸš€ Quick Start

### 1. Environment Setup

Python 3.10.11 is recommended.

```bash
git clone https://github.com/mtuciru/ReVoice-2025
cd ReVoice-2025

python3 -m venv venv
source venv/bin/activate

pip install -r requirements.txt
pip install --no-dependencies git+https://github.com/Wataru-Nakata/ssl-vocoders.git

export PYTHONPATH=./src 
```

### 2. Downloading Pre-trained Weights

The script will automatically download Miipher and HiFiGAN weights to the `./models` folder.

```bash
python3 scripts/download_weights.py
```

### 3. Dataset Preparation

Training the model requires a prepared dataset (clean + noisy audio + phonemes).
The script takes your folder with clean audio, adds noise (using the degrader config), and generates phonemes (using GigaAM for transcription if no text is present).

**Important**: Before running, edit `examples/configs/degrader_config.yaml`, specifying the path to your noise files (`noise_dir` parameter etc., if used).

```bash
python3 scripts/prepare_dataset.py \
  --input_dir /path/to/clean_audio \
  --output_dir /path/to/processed_dataset \
  --degrader_config examples/configs/degrader_config.yaml
```

### 4. Training Configuration

All training settings are located in `examples/configs/config.yaml`.
Main parameters to check:
*   `data.train_dataset_path`: Path to the folder you created in step 3.
*   `data.val_dataset_path`: Path to the validation set.
*   `train.trainer.devices`: Number and IDs of GPUs (default `1`).

### 5. Starting Training

```bash
python3 examples/train.py
```

### 6. Monitoring (TensorBoard)

Monitor training progress and metrics:

```bash
tensorboard --logdir logs/
```

### 7. Inference (Speech Restoration)

To restore speech from noisy files, use the `run_miipher.py` script. It takes a folder with input files and a folder to save the result.

```bash
python3 scripts/run_miipher.py \
  --input_dir /path/to/noisy_audio \
  --output_dir /path/to/restored_audio \
  --lang_code rus \
  --miipher_ckpt ./models/miipher.ckpt \
  --vocoder_ckpt ./models/hifigan.ckpt
```

Arguments:
*   `--input_dir`: Folder with noisy files (`.wav`, `.mp3`, `.flac`).
*   `--output_dir`: Folder where restored files will be saved.
*   `--lang_code`: Language code for phonetization (default `rus`). If text transcripts (`.txt`) exist, the script will try to find them. Otherwise, ASR (GigaAM) will be used.

### 8. Quality Evaluation (Metrics)

To calculate metrics (SI-SNR, STOI, MelLoss), use `eval.py`. The script compares the folder with restored files (hypotheses) and the folder with clean reference files (references).

```bash
python3 eval.py \
  --hyp_dir /path/to/restored_audio \
  --ref_dir /path/to/clean_reference_audio \
  --output_csv metrics_results.csv
```

Arguments:
*   `--hyp_dir`: Folder with your restored files.
*   `--ref_dir`: Folder with clean original files (files must have matching names).
*   `--output_csv`: Path to save the results table (default `metrics_results.csv`).

---

## πŸ“‚ Project Structure

*   `examples/train.py` β€” Main script for starting training.
*   `examples/configs/config.yaml` β€” Configuration for hyperparameters, paths, and the model.
*   `run_miipher.py` β€” Script for running inference on a folder.
*   `eval.py` β€” Script for calculating metrics on a folder.
*   `scripts/prepare_dataset.py` β€” Script for dataset generation (augmentation + phonemization).
*   `scripts/download_weights.py` β€” Weight downloader.
*   `src/miipher/lightning_module.py` β€” Training logic (Pytorch Lightning), training step, validation, metrics.
*   `src/miipher/dataset` β€” Data loading logic (Dataset, DataModule).
*   `src/miipher/metrics/eval_metrics.py` β€” Implementation of SI-SNR, STOI, MelLoss metrics.