DroolingPanda
/

speaker-diarization-community-1

@@ -36,13 +36,16 @@ This pipeline ingests mono audio sampled at 16kHz and outputs speaker diarizatio
 - stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels.
 - audio files sampled at a different rate are resampled to 16kHz automatically upon loading.
-The main improvement over previous versions is that we made it much easier [to use it offline](#offline-use) (i.e. without internet connection).
 ## Setup
 1. Accept user conditions
 2. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
-3. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `4.x` with `pip install pyannote.audio`
 ## Quick start
@@ -93,6 +96,7 @@ Processing is fully automated:
 The second column is [pyannoteAI premium](https://huggingface.co/pyannoteAI/speaker-diarization-precision) speaker diarization pipeline, as of April 2025. To test it:
   1. Create an API key on [`pyannoteAI` dashboard](https://dashboard.pyannote.ai).
   2. Enjoy [`pyannoteAI`](https://www.pyannote.ai) precision speaker diarization pipeline by changing one single line of code!
@@ -101,7 +105,7 @@ from pyannote.audio import Pipeline
 pipeline = Pipeline.from_pretrained(
 -     'pyannote/speaker-diarization-4.0', token="{huggingface-token}")
 +     'pyannoteAI/speaker-diarization-precision', token="{pyannoteAI-api-key}")
-diarization = pipeline("udio.wav")
 ```
 ## Processing on GPU
@@ -200,14 +204,15 @@ diarization = pipeline("audio.wav")
 }
 ```
-3. Speaker diarization pipeline
 ```bibtex
-@inproceedings{Bredin23,
-  author={Hervé Bredin},
-  title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
-  year=2023,
-  booktitle={Proc. INTERSPEECH 2023},
 }
 ```

 - stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels.
 - audio files sampled at a different rate are resampled to 16kHz automatically upon loading.
+The main improvements brought by 4.0 over previous version 3.1 are
+- much better speaker counting and assignment
+- much easier [offline use](#offline-use) (i.e. without internet connection)
 ## Setup
 1. Accept user conditions
 2. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
+3. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `4.x.x` with `pip install pyannote.audio`
 ## Quick start
 The second column is [pyannoteAI premium](https://huggingface.co/pyannoteAI/speaker-diarization-precision) speaker diarization pipeline, as of April 2025. To test it:
+  1. Create a free [`pyannoteAI`](https://dashboard.pyannote.ai) account and get 150h of free credits.
   1. Create an API key on [`pyannoteAI` dashboard](https://dashboard.pyannote.ai).
   2. Enjoy [`pyannoteAI`](https://www.pyannote.ai) precision speaker diarization pipeline by changing one single line of code!
 pipeline = Pipeline.from_pretrained(
 -     'pyannote/speaker-diarization-4.0', token="{huggingface-token}")
 +     'pyannoteAI/speaker-diarization-precision', token="{pyannoteAI-api-key}")
+diarization = pipeline("audio.wav")
 ```
 ## Processing on GPU
 }
 ```
+3. Speaker clustering
 ```bibtex
+@article{Landini2022,
+  author={Landini, Federico and Profant, J{\'a}n and Diez, Mireia and Burget, Luk{\'a}{\v{s}}},
+  title={{Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks}},
+  year={2022},
+  journal={Computer Speech \& Language},
 }
 ```