DroolingPanda
/

speaker-diarization-community-1

@@ -38,14 +38,14 @@ This pipeline ingests mono audio sampled at 16kHz and outputs speaker diarizatio
 The main improvements brought by 4.0 over previous version 3.1 are
-- much better speaker counting and assignment
 - much easier [offline use](#offline-use) (i.e. without internet connection)
 ## Setup
 1. Accept user conditions
 2. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
-3. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `4.x.x` with `pip install pyannote.audio`
 ## Quick start
@@ -65,6 +65,8 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
 ## Benchmark
 We report [diarization error rates](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %) on large collection of academic benchmarks.
 Processing is fully automated:
@@ -78,34 +80,42 @@ Processing is fully automated:
 - no forgiveness collar
 - evaluation of overlapped speech
-| Benchmark   | <a href="https://hf.co/pyannote/speaker-diarization-4.0"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/>OSS</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>API</a>
-| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ |
-| [AISHELL-4](https://arxiv.org/abs/2104.03603) | 12.2 | 12.1
-| [AliMeeting](https://www.openslr.org/119/) (channel 1)  | 24.5 | 19.8
-| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM)| 18.8 | 15.8
-| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM)| 22.7 | 18.3
-| [AVA-AVD](https://arxiv.org/abs/2111.14448)| 49.7 | 45.3
-| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 28.4 | 20.1
-| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477))                 | 21.4 | 17.2
-| [Earnings21](https://github.com/revdotcom/speech-datasets)  | 9.4  | 9.0
-| [MSDWild](https://github.com/X-LANCE/MSDWILD)    | 25.4 | 19.7
-| [RAMC](https://www.openslr.org/123/)   | 22.2 | 11.1
-| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2)  | 7.9  |  7.6
-| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3)           | 11.2 |  9.9
-The second column is [pyannoteAI premium](https://huggingface.co/pyannoteAI/speaker-diarization-precision) speaker diarization pipeline, as of April 2025. To test it:
-  1. Create a free [`pyannoteAI`](https://dashboard.pyannote.ai) account and get 150h of free credits.
-  1. Create an API key on [`pyannoteAI` dashboard](https://dashboard.pyannote.ai).
-  2. Enjoy [`pyannoteAI`](https://www.pyannote.ai) precision speaker diarization pipeline by changing one single line of code!
 ```diff
 from pyannote.audio import Pipeline
 pipeline = Pipeline.from_pretrained(
 -     'pyannote/speaker-diarization-4.0', token="{huggingface-token}")
 +     'pyannoteAI/speaker-diarization-precision', token="{pyannoteAI-api-key}")
-diarization = pipeline("audio.wav")
 ```
 ## Processing on GPU

 The main improvements brought by 4.0 over previous version 3.1 are
+- much [better](#benchmark) speaker counting and assignment
 - much easier [offline use](#offline-use) (i.e. without internet connection)
 ## Setup
 1. Accept user conditions
 2. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
+3. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `>=4.0.0` with `pip install pyannote.audio`
 ## Quick start
 ## Benchmark
+Out of the box, <img src="https://avatars.githubusercontent.com/u/7559051" width="20" style="vertical-align:text-bottom;" /> `pyannote.audio` speaker diarization pipeline v4.0 is expected to be much better than v3.1.
 We report [diarization error rates](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %) on large collection of academic benchmarks.
 Processing is fully automated:
 - no forgiveness collar
 - evaluation of overlapped speech
+| Benchmark (last updated in 2025-08) | <a href="https://hf.co/pyannote/speaker-diarization-3.1"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/>v3.1</a> | <a href="https://hf.co/pyannote/speaker-diarization-4.0"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/> v4.0</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>API</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>labs</a> |
+| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------------------------------------------| ------------------------------------------------ | --- |
+| [AISHELL-4](https://arxiv.org/abs/2104.03603)                                                                               | 12.2 | 11.7 | 11.8 | 11.4 |
+| [AliMeeting](https://www.openslr.org/119/) (channel 1)                                                                      | 24.5 | 20.3 | 16.3 | 15.2 |
+| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM)                                                                        | 18.8 | 17.0 | 13.2 | 12.9 |
+| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM)                                                                        | 22.7 | 19.9 | 15.8 | 15.6 |
+| [AVA-AVD](https://arxiv.org/abs/2111.14448)                                                                                 | 49.7 | 44.6 | 40.7 | 37.1 |
+| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 28.5 | 26.7 | 17.6 | 16.6 |
+| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477))                             | 21.4 | 20.2 | 15.7 | 14.7 |
+| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.)                                                                            | 51.2 | 46.8 | 44.7 | 39.0 |
+| [MSDWild](https://github.com/X-LANCE/MSDWILD)                                                                               | 25.4 | 22.8 | 17.9 | 17.3 |
+| [RAMC](https://www.openslr.org/123/)                                                                                        | 22.2 | 20.8 | 10.6 | 10.5 |
+| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2)                                                       | 7.9  |  8.9 |  7.3 |  7.4 |
+| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3)                                                                | 11.2 | 11.2 |  9.0 |  8.5 |
+__[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %, the lower, the better)__
+| Benchmark (last updated in 2025-08) | <img src="https://avatars.githubusercontent.com/u/7559051" width="32" /> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /></a>  | Speed up
+| -------------- | ----------- | ----------- | ------ |
+| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM), ~1h files                                                     | 31s per hour of audio | 14s per hour of audio | 2.2x faster
+| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)), ~5min files        | 37s per hour of audio | 14s per hour of audio | 2.6x faster
+__Processing speed on a NVIDIA H100 80GB HBM3__
+<img src="https://avatars.githubusercontent.com/u/162698670" width="20" style="vertical-align:text-bottom;" /> `pyannoteAI` premium models are even better (and also 2x faster). <img src="https://avatars.githubusercontent.com/u/162698670" width="20" style="vertical-align:text-bottom;" /> `labs` model is currently in private beta.
+1. Create pyannoteAI API key at [`dashboard.pyannote.ai`](https://dashboard.pyannote.ai)
+2. Enjoy 150 hours of free credits by changing one single line of code!
 ```diff
 from pyannote.audio import Pipeline
 pipeline = Pipeline.from_pretrained(
 -     'pyannote/speaker-diarization-4.0', token="{huggingface-token}")
 +     'pyannoteAI/speaker-diarization-precision', token="{pyannoteAI-api-key}")
+diarization = pipeline("audio.wav")  # runs on pyannoteAI servers
 ```
 ## Processing on GPU