File size: 2,951 Bytes
27ecd0c
0d1ed5b
 
 
 
 
27ecd0c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0d1ed5b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27ecd0c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
tags:
  - pyannote
  - pyannote-audio
  - pyannote-audio-pipeline
  - speaker-diarization
license: mit
language:
- en
---
# Configuration
This model outlines the setup of a fine-tuned speaker diarization model with synthetic medical audio data.

Before starting, please ensure the requirements are met:

1. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `3.1` with `pip install pyannote.audio`
2. Accept [`pyannote/segmentation-3.0`](https://hf.co/pyannote/segmentation-3.0) user conditions
3. Accept [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) user conditions
4. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
5. Download pytorch_model.bin and config.yaml files into your local directory.

## Usage

### Load trained segmentation model
```python
import torch
from pyannote.audio import Model

# Load the original architecture, will need to replace with your own auth token
model = Model.from_pretrained("pyannote/segmentation-3.0", use_auth_token=True)

# Path to the downloaded pytorch model
model_path = "models/pyannote_sd_normal"

# Load fine-tuned weights from the pytorch_model.bin file
model.load_state_dict(torch.load(model_path + "/pytorch_model.bin"))
```
### Load fine-tuned speaker diarization pipeline
```python
from pyannote.audio import Pipeline
from pyannote.metrics.diarization import DiarizationErrorRate
from pyannote.audio.pipelines import SpeakerDiarization

# Initialize the pyannote pipeline, will need to replace with your own auth token
pretrained_pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token=True)

finetuned_pipeline = SpeakerDiarization(
    segmentation=model,
    embedding=pretrained_pipeline.embedding,
    embedding_exclude_overlap=pretrained_pipeline.embedding_exclude_overlap,
    clustering=pretrained_pipeline.klustering,
)

# Load fine-tuned params into the pipeline
finetuned_pipeline.load_params(model_path + "/config.yaml")
```
### GPU usage
```
if torch.cuda.is_available():
    gpu = torch.device("cuda")
    finetuned_pipeline.to(gpu)
    print("gpu: ", torch.cuda.get_device_name(gpu))
```

### Visualise diarization output
```
diarization = finetuned_pipeline("path/to/audio.wav")
diarization
```

### View speaker turns, speaker ID, and time
```
for speech_turn, track, speaker in diarization.itertracks(yield_label=True):
    print(f"{speech_turn.start:4.1f} {speech_turn.end:4.1f} {speaker}")
```

## Citations

```bibtex
@inproceedings{Plaquet23,
  author={Alexis Plaquet and Hervé Bredin},
  title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
```

```bibtex
@inproceedings{Bredin23,
  author={Hervé Bredin},
  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
  year=2023,
  booktitle={Proc. INTERSPEECH 2023},
}
```