Update README.md
Browse files
README.md
CHANGED
|
@@ -36,13 +36,16 @@ This pipeline ingests mono audio sampled at 16kHz and outputs speaker diarizatio
|
|
| 36 |
- stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels.
|
| 37 |
- audio files sampled at a different rate are resampled to 16kHz automatically upon loading.
|
| 38 |
|
| 39 |
-
The main
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
## Setup
|
| 42 |
|
| 43 |
1. Accept user conditions
|
| 44 |
2. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
|
| 45 |
-
3. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `4.x` with `pip install pyannote.audio`
|
| 46 |
|
| 47 |
## Quick start
|
| 48 |
|
|
@@ -93,6 +96,7 @@ Processing is fully automated:
|
|
| 93 |
|
| 94 |
The second column is [pyannoteAI premium](https://huggingface.co/pyannoteAI/speaker-diarization-precision) speaker diarization pipeline, as of April 2025. To test it:
|
| 95 |
|
|
|
|
| 96 |
1. Create an API key on [`pyannoteAI` dashboard](https://dashboard.pyannote.ai).
|
| 97 |
2. Enjoy [`pyannoteAI`](https://www.pyannote.ai) precision speaker diarization pipeline by changing one single line of code!
|
| 98 |
|
|
@@ -101,7 +105,7 @@ from pyannote.audio import Pipeline
|
|
| 101 |
pipeline = Pipeline.from_pretrained(
|
| 102 |
- 'pyannote/speaker-diarization-4.0', token="{huggingface-token}")
|
| 103 |
+ 'pyannoteAI/speaker-diarization-precision', token="{pyannoteAI-api-key}")
|
| 104 |
-
diarization = pipeline("
|
| 105 |
```
|
| 106 |
|
| 107 |
## Processing on GPU
|
|
@@ -200,14 +204,15 @@ diarization = pipeline("audio.wav")
|
|
| 200 |
}
|
| 201 |
```
|
| 202 |
|
| 203 |
-
|
|
|
|
| 204 |
|
| 205 |
```bibtex
|
| 206 |
-
@
|
| 207 |
-
author={
|
| 208 |
-
title={{
|
| 209 |
-
year=
|
| 210 |
-
|
| 211 |
}
|
| 212 |
```
|
| 213 |
|
|
|
|
| 36 |
- stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels.
|
| 37 |
- audio files sampled at a different rate are resampled to 16kHz automatically upon loading.
|
| 38 |
|
| 39 |
+
The main improvements brought by 4.0 over previous version 3.1 are
|
| 40 |
+
|
| 41 |
+
- much better speaker counting and assignment
|
| 42 |
+
- much easier [offline use](#offline-use) (i.e. without internet connection)
|
| 43 |
|
| 44 |
## Setup
|
| 45 |
|
| 46 |
1. Accept user conditions
|
| 47 |
2. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
|
| 48 |
+
3. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `4.x.x` with `pip install pyannote.audio`
|
| 49 |
|
| 50 |
## Quick start
|
| 51 |
|
|
|
|
| 96 |
|
| 97 |
The second column is [pyannoteAI premium](https://huggingface.co/pyannoteAI/speaker-diarization-precision) speaker diarization pipeline, as of April 2025. To test it:
|
| 98 |
|
| 99 |
+
1. Create a free [`pyannoteAI`](https://dashboard.pyannote.ai) account and get 150h of free credits.
|
| 100 |
1. Create an API key on [`pyannoteAI` dashboard](https://dashboard.pyannote.ai).
|
| 101 |
2. Enjoy [`pyannoteAI`](https://www.pyannote.ai) precision speaker diarization pipeline by changing one single line of code!
|
| 102 |
|
|
|
|
| 105 |
pipeline = Pipeline.from_pretrained(
|
| 106 |
- 'pyannote/speaker-diarization-4.0', token="{huggingface-token}")
|
| 107 |
+ 'pyannoteAI/speaker-diarization-precision', token="{pyannoteAI-api-key}")
|
| 108 |
+
diarization = pipeline("audio.wav")
|
| 109 |
```
|
| 110 |
|
| 111 |
## Processing on GPU
|
|
|
|
| 204 |
}
|
| 205 |
```
|
| 206 |
|
| 207 |
+
|
| 208 |
+
3. Speaker clustering
|
| 209 |
|
| 210 |
```bibtex
|
| 211 |
+
@article{Landini2022,
|
| 212 |
+
author={Landini, Federico and Profant, J{\'a}n and Diez, Mireia and Burget, Luk{\'a}{\v{s}}},
|
| 213 |
+
title={{Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks}},
|
| 214 |
+
year={2022},
|
| 215 |
+
journal={Computer Speech \& Language},
|
| 216 |
}
|
| 217 |
```
|
| 218 |
|