zhniu commited on
Commit
a63a304
·
verified ·
1 Parent(s): 26fdc29

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -3
README.md CHANGED
@@ -1,3 +1,94 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ the opensource address: **[here](https://github.com/niuzb/wav2vec2_ssd)**
2
+
3
+ # Wav2Vec2 for Speaker Diarization
4
+
5
+ This repository provides a comprehensive solution for speaker diarization using a Wav2Vec2-based model. It includes scripts for data preparation, training, evaluation, and inference.
6
+
7
+ ## 1. Data Preparation
8
+
9
+ The first step is to prepare the training and testing data. This involves generating reference labels from RTTM files and splitting the audio into chunks.
10
+
11
+ ### 1.1. Generate Training Data
12
+
13
+ Run the following script to generate training data. You need to provide the directory containing the RTTM files and a list of WAV files.
14
+
15
+ ```bash
16
+ bash prepare_training_data/gen_training_data_for_ssd.sh
17
+ ```
18
+
19
+ This script will:
20
+ - Read the WAV file list from `wav_list_file.txt`.
21
+ - Process the corresponding RTTM files from the specified directory.
22
+ - Generate training data in JSON format, which will be saved in `prepare_training_data/dir_ref_out/`.
23
+
24
+ ### 1.2. Generate Test Data
25
+
26
+ Similarly, run the following script to generate test data.
27
+
28
+ ```bash
29
+ bash prepare_training_data/gen_test_data_for_ssd.sh
30
+ ```
31
+
32
+ This script will generate test data and save it in `prepare_training_data/dir_ref_out_for_test/`.
33
+
34
+ ## 2. Training
35
+
36
+ Once the data is prepared, you can train the speaker diarization model.
37
+
38
+ ```bash
39
+ bash run_ssd_train.sh
40
+ ```
41
+
42
+ This script will:
43
+ - Load the pre-trained `zhniu/wav2vec2_ssd` model.
44
+ - Use the generated training and validation data.
45
+ - Train the model for a specified number of epochs.
46
+ - Save the fine-tuned model to the `./experiments/new_model` directory.
47
+
48
+ ## 3. Evaluation
49
+
50
+ After training, you can evaluate the model's performance on the test set. The evaluation script calculates precision, recall, and F1-score for speaker change detection (SCD).
51
+
52
+ ```bash
53
+ python src/run_ssd_metrics.py \
54
+ --model_checkpoint ./experiments/new_model \
55
+ --file_list prepare_training_data/wav_list_file_for_test.txt \
56
+ --rttm_dir /path/to/your/rttm/test/dir \
57
+ --matchrange 0.5
58
+ ```
59
+
60
+ The script will output the following metrics:
61
+ - **Precision**: The proportion of correctly predicted speaker changes among all predicted changes.
62
+ - **Recall**: The proportion of correctly predicted speaker changes among all actual changes.
63
+ - **F1-Score**: The harmonic mean of precision and recall.
64
+
65
+ ## 4. Inference
66
+
67
+ For running inference on a single audio file to detect speaker changes, use the `run_ssd_eval.py` script.
68
+
69
+ ```bash
70
+ python src/run_ssd_eval.py --file examples/test.wav
71
+ ```
72
+
73
+ The output will show the timestamps of the detected speaker changes:
74
+
75
+ ```
76
+ python src/run_ssd_eval.py --file examples/test.wav
77
+ 2025-08-21 03:32:35,644 - INFO - Using device: cuda
78
+ 2025-08-21 03:32:35,650 - INFO - Loading model and processor...
79
+ 2025-08-21 03:32:38,377 - INFO - Processing audio file...
80
+ 2025-08-21 03:32:40,168 - INFO - Extracting speaker change information...
81
+ 2025-08-21 03:32:40,168 - INFO - Smoothing parameters: min_interval=0.2s, window_size=0.4s, smooth_strength=1.0
82
+
83
+ --- Speaker Change Points ---
84
+ Change No. Time (s)
85
+ ------------------------------
86
+ Change 1 6.772
87
+ Change 2 34.587
88
+ Change 3 49.373
89
+ Change 4 104.197
90
+ Change 5 113.366
91
+
92
+ ---
93
+ license: apache-2.0
94
+ ---