zhniu
/

wav2vec2_ssd

Safetensors

wav2vec2

Model card Files Files and versions

xet

Community

zhniu commited on Aug 22, 2025

Commit

a63a304

verified ·

1 Parent(s): 26fdc29

Update README.md

Browse files

Files changed (1) hide show

README.md +94 -3

README.md CHANGED Viewed

@@ -1,3 +1,94 @@
----
-license: apache-2.0
----

+the opensource address: **[here](https://github.com/niuzb/wav2vec2_ssd)**
+# Wav2Vec2 for Speaker Diarization
+This repository provides a comprehensive solution for speaker diarization using a Wav2Vec2-based model. It includes scripts for data preparation, training, evaluation, and inference.
+## 1. Data Preparation
+The first step is to prepare the training and testing data. This involves generating reference labels from RTTM files and splitting the audio into chunks.
+### 1.1. Generate Training Data
+Run the following script to generate training data. You need to provide the directory containing the RTTM files and a list of WAV files.
+```bash
+bash prepare_training_data/gen_training_data_for_ssd.sh
+```
+This script will:
+- Read the WAV file list from `wav_list_file.txt`.
+- Process the corresponding RTTM files from the specified directory.
+- Generate training data in JSON format, which will be saved in `prepare_training_data/dir_ref_out/`.
+### 1.2. Generate Test Data
+Similarly, run the following script to generate test data.
+```bash
+bash prepare_training_data/gen_test_data_for_ssd.sh
+```
+This script will generate test data and save it in `prepare_training_data/dir_ref_out_for_test/`.
+## 2. Training
+Once the data is prepared, you can train the speaker diarization model.
+```bash
+bash run_ssd_train.sh
+```
+This script will:
+- Load the pre-trained `zhniu/wav2vec2_ssd` model.
+- Use the generated training and validation data.
+- Train the model for a specified number of epochs.
+- Save the fine-tuned model to the `./experiments/new_model` directory.
+## 3. Evaluation
+After training, you can evaluate the model's performance on the test set. The evaluation script calculates precision, recall, and F1-score for speaker change detection (SCD).
+```bash
+python src/run_ssd_metrics.py \
+    --model_checkpoint ./experiments/new_model \
+    --file_list prepare_training_data/wav_list_file_for_test.txt \
+    --rttm_dir /path/to/your/rttm/test/dir \
+    --matchrange 0.5
+```
+The script will output the following metrics:
+- **Precision**: The proportion of correctly predicted speaker changes among all predicted changes.
+- **Recall**: The proportion of correctly predicted speaker changes among all actual changes.
+- **F1-Score**: The harmonic mean of precision and recall.
+## 4. Inference
+For running inference on a single audio file to detect speaker changes, use the `run_ssd_eval.py` script.
+```bash
+python src/run_ssd_eval.py --file examples/test.wav
+```
+The output will show the timestamps of the detected speaker changes:
+```
+python src/run_ssd_eval.py --file examples/test.wav
+2025-08-21 03:32:35,644 - INFO - Using device: cuda
+2025-08-21 03:32:35,650 - INFO - Loading model and processor...
+2025-08-21 03:32:38,377 - INFO - Processing audio file...
+2025-08-21 03:32:40,168 - INFO - Extracting speaker change information...
+2025-08-21 03:32:40,168 - INFO - Smoothing parameters: min_interval=0.2s, window_size=0.4s, smooth_strength=1.0
+--- Speaker Change Points ---
+Change No.     Time (s)
+------------------------------
+Change 1             6.772
+Change 2             34.587
+Change 3             49.373
+Change 4             104.197
+Change 5             113.366
+---
+license: apache-2.0
+---