Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,94 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
the opensource address: **[here](https://github.com/niuzb/wav2vec2_ssd)**
|
| 2 |
+
|
| 3 |
+
# Wav2Vec2 for Speaker Diarization
|
| 4 |
+
|
| 5 |
+
This repository provides a comprehensive solution for speaker diarization using a Wav2Vec2-based model. It includes scripts for data preparation, training, evaluation, and inference.
|
| 6 |
+
|
| 7 |
+
## 1. Data Preparation
|
| 8 |
+
|
| 9 |
+
The first step is to prepare the training and testing data. This involves generating reference labels from RTTM files and splitting the audio into chunks.
|
| 10 |
+
|
| 11 |
+
### 1.1. Generate Training Data
|
| 12 |
+
|
| 13 |
+
Run the following script to generate training data. You need to provide the directory containing the RTTM files and a list of WAV files.
|
| 14 |
+
|
| 15 |
+
```bash
|
| 16 |
+
bash prepare_training_data/gen_training_data_for_ssd.sh
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
This script will:
|
| 20 |
+
- Read the WAV file list from `wav_list_file.txt`.
|
| 21 |
+
- Process the corresponding RTTM files from the specified directory.
|
| 22 |
+
- Generate training data in JSON format, which will be saved in `prepare_training_data/dir_ref_out/`.
|
| 23 |
+
|
| 24 |
+
### 1.2. Generate Test Data
|
| 25 |
+
|
| 26 |
+
Similarly, run the following script to generate test data.
|
| 27 |
+
|
| 28 |
+
```bash
|
| 29 |
+
bash prepare_training_data/gen_test_data_for_ssd.sh
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
This script will generate test data and save it in `prepare_training_data/dir_ref_out_for_test/`.
|
| 33 |
+
|
| 34 |
+
## 2. Training
|
| 35 |
+
|
| 36 |
+
Once the data is prepared, you can train the speaker diarization model.
|
| 37 |
+
|
| 38 |
+
```bash
|
| 39 |
+
bash run_ssd_train.sh
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
This script will:
|
| 43 |
+
- Load the pre-trained `zhniu/wav2vec2_ssd` model.
|
| 44 |
+
- Use the generated training and validation data.
|
| 45 |
+
- Train the model for a specified number of epochs.
|
| 46 |
+
- Save the fine-tuned model to the `./experiments/new_model` directory.
|
| 47 |
+
|
| 48 |
+
## 3. Evaluation
|
| 49 |
+
|
| 50 |
+
After training, you can evaluate the model's performance on the test set. The evaluation script calculates precision, recall, and F1-score for speaker change detection (SCD).
|
| 51 |
+
|
| 52 |
+
```bash
|
| 53 |
+
python src/run_ssd_metrics.py \
|
| 54 |
+
--model_checkpoint ./experiments/new_model \
|
| 55 |
+
--file_list prepare_training_data/wav_list_file_for_test.txt \
|
| 56 |
+
--rttm_dir /path/to/your/rttm/test/dir \
|
| 57 |
+
--matchrange 0.5
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
The script will output the following metrics:
|
| 61 |
+
- **Precision**: The proportion of correctly predicted speaker changes among all predicted changes.
|
| 62 |
+
- **Recall**: The proportion of correctly predicted speaker changes among all actual changes.
|
| 63 |
+
- **F1-Score**: The harmonic mean of precision and recall.
|
| 64 |
+
|
| 65 |
+
## 4. Inference
|
| 66 |
+
|
| 67 |
+
For running inference on a single audio file to detect speaker changes, use the `run_ssd_eval.py` script.
|
| 68 |
+
|
| 69 |
+
```bash
|
| 70 |
+
python src/run_ssd_eval.py --file examples/test.wav
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
The output will show the timestamps of the detected speaker changes:
|
| 74 |
+
|
| 75 |
+
```
|
| 76 |
+
python src/run_ssd_eval.py --file examples/test.wav
|
| 77 |
+
2025-08-21 03:32:35,644 - INFO - Using device: cuda
|
| 78 |
+
2025-08-21 03:32:35,650 - INFO - Loading model and processor...
|
| 79 |
+
2025-08-21 03:32:38,377 - INFO - Processing audio file...
|
| 80 |
+
2025-08-21 03:32:40,168 - INFO - Extracting speaker change information...
|
| 81 |
+
2025-08-21 03:32:40,168 - INFO - Smoothing parameters: min_interval=0.2s, window_size=0.4s, smooth_strength=1.0
|
| 82 |
+
|
| 83 |
+
--- Speaker Change Points ---
|
| 84 |
+
Change No. Time (s)
|
| 85 |
+
------------------------------
|
| 86 |
+
Change 1 6.772
|
| 87 |
+
Change 2 34.587
|
| 88 |
+
Change 3 49.373
|
| 89 |
+
Change 4 104.197
|
| 90 |
+
Change 5 113.366
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
license: apache-2.0
|
| 94 |
+
---
|