NikiPshg commited on
Commit
42430bd
Β·
verified Β·
1 Parent(s): a55246b

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -0
README.md ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ReVoice-2025 β€” Speech Enhancement Hackathon (Baseline)
2
+
3
+ This repository represents a **baseline** (basic solution) for participating in the **ReVoice-2025** hackathon. The project is based on the **Miipher** model and adapted for the competition. We tried to make the code as clean, fast, and convenient as possible.
4
+
5
+ ## πŸš€ Quick Start
6
+
7
+ ### 1. Environment Setup
8
+
9
+ Python 3.10.11 is recommended.
10
+
11
+ ```bash
12
+ git clone https://github.com/mtuciru/ReVoice-2025
13
+ cd ReVoice-2025
14
+
15
+ python3 -m venv venv
16
+ source venv/bin/activate
17
+
18
+ pip install -r requirements.txt
19
+ pip install --no-dependencies git+https://github.com/Wataru-Nakata/ssl-vocoders.git
20
+
21
+ export PYTHONPATH=./src
22
+ ```
23
+
24
+ ### 2. Downloading Pre-trained Weights
25
+
26
+ The script will automatically download Miipher and HiFiGAN weights to the `./models` folder.
27
+
28
+ ```bash
29
+ python3 scripts/download_weights.py
30
+ ```
31
+
32
+ ### 3. Dataset Preparation
33
+
34
+ Training the model requires a prepared dataset (clean + noisy audio + phonemes).
35
+ The script takes your folder with clean audio, adds noise (using the degrader config), and generates phonemes (using GigaAM for transcription if no text is present).
36
+
37
+ **Important**: Before running, edit `examples/configs/degrader_config.yaml`, specifying the path to your noise files (`noise_dir` parameter etc., if used).
38
+
39
+ ```bash
40
+ python3 scripts/prepare_dataset.py \
41
+ --input_dir /path/to/clean_audio \
42
+ --output_dir /path/to/processed_dataset \
43
+ --degrader_config examples/configs/degrader_config.yaml
44
+ ```
45
+
46
+ ### 4. Training Configuration
47
+
48
+ All training settings are located in `examples/configs/config.yaml`.
49
+ Main parameters to check:
50
+ * `data.train_dataset_path`: Path to the folder you created in step 3.
51
+ * `data.val_dataset_path`: Path to the validation set.
52
+ * `train.trainer.devices`: Number and IDs of GPUs (default `1`).
53
+
54
+ ### 5. Starting Training
55
+
56
+ ```bash
57
+ python3 examples/train.py
58
+ ```
59
+
60
+ ### 6. Monitoring (TensorBoard)
61
+
62
+ Monitor training progress and metrics:
63
+
64
+ ```bash
65
+ tensorboard --logdir logs/
66
+ ```
67
+
68
+ ### 7. Inference (Speech Restoration)
69
+
70
+ To restore speech from noisy files, use the `run_miipher.py` script. It takes a folder with input files and a folder to save the result.
71
+
72
+ ```bash
73
+ python3 scripts/run_miipher.py \
74
+ --input_dir /path/to/noisy_audio \
75
+ --output_dir /path/to/restored_audio \
76
+ --lang_code rus \
77
+ --miipher_ckpt ./models/miipher.ckpt \
78
+ --vocoder_ckpt ./models/hifigan.ckpt
79
+ ```
80
+
81
+ Arguments:
82
+ * `--input_dir`: Folder with noisy files (`.wav`, `.mp3`, `.flac`).
83
+ * `--output_dir`: Folder where restored files will be saved.
84
+ * `--lang_code`: Language code for phonetization (default `rus`). If text transcripts (`.txt`) exist, the script will try to find them. Otherwise, ASR (GigaAM) will be used.
85
+
86
+ ### 8. Quality Evaluation (Metrics)
87
+
88
+ To calculate metrics (SI-SNR, STOI, MelLoss), use `eval.py`. The script compares the folder with restored files (hypotheses) and the folder with clean reference files (references).
89
+
90
+ ```bash
91
+ python3 eval.py \
92
+ --hyp_dir /path/to/restored_audio \
93
+ --ref_dir /path/to/clean_reference_audio \
94
+ --output_csv metrics_results.csv
95
+ ```
96
+
97
+ Arguments:
98
+ * `--hyp_dir`: Folder with your restored files.
99
+ * `--ref_dir`: Folder with clean original files (files must have matching names).
100
+ * `--output_csv`: Path to save the results table (default `metrics_results.csv`).
101
+
102
+ ---
103
+
104
+ ## πŸ“‚ Project Structure
105
+
106
+ * `examples/train.py` β€” Main script for starting training.
107
+ * `examples/configs/config.yaml` β€” Configuration for hyperparameters, paths, and the model.
108
+ * `run_miipher.py` β€” Script for running inference on a folder.
109
+ * `eval.py` β€” Script for calculating metrics on a folder.
110
+ * `scripts/prepare_dataset.py` β€” Script for dataset generation (augmentation + phonemization).
111
+ * `scripts/download_weights.py` β€” Weight downloader.
112
+ * `src/miipher/lightning_module.py` β€” Training logic (Pytorch Lightning), training step, validation, metrics.
113
+ * `src/miipher/dataset` β€” Data loading logic (Dataset, DataModule).
114
+ * `src/miipher/metrics/eval_metrics.py` β€” Implementation of SI-SNR, STOI, MelLoss metrics.
115
+