DroolingPanda
/

speaker-diarization-community-1

@@ -13,39 +13,31 @@ tags:
   - overlapped-speech-detection
   - automatic-speech-recognition
 license: cc-by-4.0
-extra_gated_prompt: "The collected information will help acquire a better knowledge of pyannote.audio user base and help maintainers improve it further. Though this pipeline uses CC-BY-4.0 license and will always remain open-source, we will occasionnally email you about premium pipelines and paid services around pyannote."
 extra_gated_fields:
   Company/university: text
   Website: text
 ---
-<p align="center">
-  <a href="https://pyannote.ai/" target="blank"><img src="https://avatars.githubusercontent.com/u/162698670" width="64" /></a>
-</p>
-<div align="center">
-    <h1>Speaker diarization 4.0</h1>
-</div>
-<p align="center">
-  <img src="diarization.gif"/>
-</p>
 This pipeline ingests mono audio sampled at 16kHz and outputs speaker diarization as an [`Annotation`](http://pyannote.github.io/pyannote-core/structure.html#annotation) instance:
 - stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels.
 - audio files sampled at a different rate are resampled to 16kHz automatically upon loading.
-The main improvements brought by 4.0 over previous version 3.1 are
-- much [better](#benchmark) speaker counting and assignment
-- much easier [offline use](#offline-use) (i.e. without internet connection)
 ## Setup
-1. Accept user conditions
-2. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
-3. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `>=4.0.0` with `pip install pyannote.audio`
 ## Quick start
@@ -53,54 +45,48 @@ The main improvements brought by 4.0 over previous version 3.1 are
 # download the pipeline from Huggingface
 from pyannote.audio import Pipeline
 pipeline = Pipeline.from_pretrained(
-    "pyannote/speaker-diarization-4.0", token="{huggingface-token}")
 # run the pipeline locally on your computer
-diarization = pipeline("audio.wav")
 # print the predicted speaker diarization
-for turn, _, speaker in diarization.itertracks(yield_label=True):
     print(f"{speaker} speaks between t={turn.start:.3f}s and t={turn.end:.3f}s")
 ```
 ## Benchmark
-Out of the box, `pyannote.audio` speaker diarization pipeline v4.0 is expected to be much better than v3.1.
 We report [diarization error rates](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %) on large collection of academic benchmarks (fully automatic processing, no forgiveness collar, nor skipping overlapping speech).
-| Benchmark (last updated in 2025-08) | <a href="https://hf.co/pyannote/speaker-diarization-3.1"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/>v3.1</a> | <a href="https://hf.co/pyannote/speaker-diarization-4.0"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/> v4.0</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>API</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>labs</a> |
-| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------------------------------------------| ------------------------------------------------ | --- |
-| [AISHELL-4](https://arxiv.org/abs/2104.03603)                                                                               | 12.2 | 11.7 | 11.8 | 11.4 |
-| [AliMeeting](https://www.openslr.org/119/) (channel 1)                                                                      | 24.5 | 20.3 | 16.3 | 15.2 |
-| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM)                                                                        | 18.8 | 17.0 | 13.2 | 12.9 |
-| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM)                                                                        | 22.7 | 19.9 | 15.8 | 15.6 |
-| [AVA-AVD](https://arxiv.org/abs/2111.14448)                                                                                 | 49.7 | 44.6 | 40.7 | 37.1 |
-| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 28.5 | 26.7 | 17.6 | 16.6 |
-| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477))                             | 21.4 | 20.2 | 15.7 | 14.7 |
-| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.)                                                                            | 51.2 | 46.8 | 44.7 | 39.0 |
-| [MSDWild](https://github.com/X-LANCE/MSDWILD)                                                                               | 25.4 | 22.8 | 17.9 | 17.3 |
-| [RAMC](https://www.openslr.org/123/)                                                                                        | 22.2 | 20.8 | 10.6 | 10.5 |
-| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2)                                                       | 7.9  |  8.9 |  7.3 |  7.4 |
-| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3)                                                                | 11.2 | 11.2 |  9.0 |  8.5 |
-We also report processing speed on a NVIDIA H100 80GB HBM3:
-| Benchmark (last updated in 2025-08) | <img src="https://avatars.githubusercontent.com/u/7559051" width="32" /> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /></a>  | Speed up
-| -------------- | ----------- | ----------- | ------ |
-| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM), ~1h files                                                     | 31s per hour of audio | 14s per hour of audio | 2.2x faster
-| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)), ~5min files        | 37s per hour of audio | 14s per hour of audio | 2.6x faster
-`pyannoteAI` premium models are even better (and also 2x faster). `labs` model is currently in private beta.
-1. Create pyannoteAI API key at [`dashboard.pyannote.ai`](https://dashboard.pyannote.ai)
-2. Enjoy 150 hours of free credits by changing one single line of code!
 ```diff
 from pyannote.audio import Pipeline
 pipeline = Pipeline.from_pretrained(
--     'pyannote/speaker-diarization-4.0', token="{huggingface-token}")
-+     'pyannoteAI/speaker-diarization-precision', token="{pyannoteAI-api-key}")
 diarization = pipeline("audio.wav")  # runs on pyannoteAI servers
 ```
@@ -120,7 +106,7 @@ Pre-loading audio files in memory may result in faster processing:
 ```python
 waveform, sample_rate = torchaudio.load("audio.wav")
-diarization = pipeline({"waveform": waveform, "sample_rate": sample_rate})
 ```
 ## Monitoring progress
@@ -130,7 +116,7 @@ Hooks are available to monitor the progress of the pipeline:
 ```python
 from pyannote.audio.pipelines.utils.hook import ProgressHook
 with ProgressHook() as hook:
-    diarization = pipeline("audio.wav", hook=hook)
 ```
 ## Controlling the number of speakers
@@ -138,15 +124,21 @@ with ProgressHook() as hook:
 In case the number of speakers is known in advance, one can use the `num_speakers` option:
 ```python
-diarization = pipeline("audio.wav", num_speakers=2)
 ```
 One can also provide lower and/or upper bounds on the number of speakers using `min_speakers` and `max_speakers` options:
 ```python
-diarization = pipeline("audio.wav", min_speakers=2, max_speakers=5)
 ```
 ## Offline use
 1. In the terminal, copy the pipeline on disk:
@@ -160,7 +152,7 @@ mkdir /path/to/directory
 # when prompted for a password, use an access token with write permissions.
 # generate one from your settings: https://huggingface.co/settings/tokens
-git clone https://hf.co/pyannote/speaker-diarization-4.0 /path/to/directory/pyannote-speaker-diarization-4.0
 ```
 2. In Python, use the pipeline without internet connection:
@@ -168,10 +160,10 @@ git clone https://hf.co/pyannote/speaker-diarization-4.0 /path/to/directory/pyan
 ```python
 # load pipeline from disk (works without internet connection)
 from pyannote.audio import Pipeline
-pipeline = Pipeline.from_pretrained('/path/to/directory/pyannote-speaker-diarization-4.0')
 # run the pipeline locally on your computer
-diarization = pipeline("audio.wav")
 ```
 ## Citations

   - overlapped-speech-detection
   - automatic-speech-recognition
 license: cc-by-4.0
+extra_gated_prompt: "The collected information will help acquire a better knowledge of pyannote user base and help maintainers improve it further. Though this pipeline uses CC-BY-4.0 license and will always remain open-source, we will occasionnally email you about premium pipelines and paid services around pyannote."
 extra_gated_fields:
   Company/university: text
   Website: text
 ---
+# `Community-1` speaker diarization
 This pipeline ingests mono audio sampled at 16kHz and outputs speaker diarization as an [`Annotation`](http://pyannote.github.io/pyannote-core/structure.html#annotation) instance:
 - stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels.
 - audio files sampled at a different rate are resampled to 16kHz automatically upon loading.
+The main improvements brought by `Community-1` are:
+- [improved](#benchmark) speaker assignment and counting
+- simpler reconciliation with transcription timestamps with [*exclusive*](#exclusive-speaker-diarization) speaker diarization
+- easy [offline use](#offline-use) (i.e. without internet connection)
+- (optionally) [hosted](https://hf.co/pyannote/speaker-diarization-community-1-cloud) on pyannoteAI cloud
 ## Setup
+1. `pip install pyannote.audio`
+2. Accept user conditions
+3. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
 ## Quick start
 # download the pipeline from Huggingface
 from pyannote.audio import Pipeline
 pipeline = Pipeline.from_pretrained(
+    "pyannote/speaker-diarization-community-1",
+    token="{huggingface-token}")
 # run the pipeline locally on your computer
+output = pipeline("audio.wav")
 # print the predicted speaker diarization
+for turn, _, speaker in output.speaker_diarization.itertracks(yield_label=True):
     print(f"{speaker} speaks between t={turn.start:.3f}s and t={turn.end:.3f}s")
 ```
 ## Benchmark
+Out of the box, `Community-1` is much better than `speaker-diarization-3.1`.
 We report [diarization error rates](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %) on large collection of academic benchmarks (fully automatic processing, no forgiveness collar, nor skipping overlapping speech).
+| Benchmark (last updated in 2025-09) | <a href="https://hf.co/pyannote/speaker-diarization-3.1">3.1</a> | <a href="https://hf.co/pyannote/speaker-diarization-community-1">Community-1</a> | <a href="https://docs.pyannote.ai">Precision-2</a>
+| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------------------------------------------| ------------------------------------------------
+| [AISHELL-4](https://arxiv.org/abs/2104.03603)                                                                               | 12.2 | 11.7 | 11.8 |
+| [AliMeeting](https://www.openslr.org/119/) (channel 1)                                                                      | 24.5 | 20.3 | 16.3 |
+| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM)                                                                        | 18.8 | 17.0 | 13.2 |
+| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM)                                                                        | 22.7 | 19.9 | 15.8 |
+| [AVA-AVD](https://arxiv.org/abs/2111.14448)                                                                                 | 49.7 | 44.6 | 40.7 |
+| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 28.5 | 26.7 | 17.6 |
+| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477))                             | 21.4 | 20.2 | 15.7 |
+| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.)                                                                            | 51.2 | 46.8 | 44.7 |
+| [MSDWild](https://github.com/X-LANCE/MSDWILD)                                                                               | 25.4 | 22.8 | 17.9 |
+| [RAMC](https://www.openslr.org/123/)                                                                                        | 22.2 | 20.8 | 10.6 |
+| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2)                                                       | 7.9  |  8.9 |  7.3 |
+| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3)                                                                | 11.2 | 11.2 |  9.0 |
+`Precision-2` model is even better and can be tested like this:
+1. Create an API key on [pyannoteAI dashboard]((https://dashboard.pyannote.ai)) (free credits included)
+2. Change one line of code
 ```diff
 from pyannote.audio import Pipeline
 pipeline = Pipeline.from_pretrained(
+-     'pyannote/speaker-diarization-community-1', token="{huggingface-token}")
++     'pyannote/speaker-diarization-precision-2', token="{pyannoteAI-api-key}")
 diarization = pipeline("audio.wav")  # runs on pyannoteAI servers
 ```
 ```python
 waveform, sample_rate = torchaudio.load("audio.wav")
+output = pipeline({"waveform": waveform, "sample_rate": sample_rate})
 ```
 ## Monitoring progress
 ```python
 from pyannote.audio.pipelines.utils.hook import ProgressHook
 with ProgressHook() as hook:
+    output = pipeline("audio.wav", hook=hook)
 ```
 ## Controlling the number of speakers
 In case the number of speakers is known in advance, one can use the `num_speakers` option:
 ```python
+output = pipeline("audio.wav", num_speakers=2)
 ```
 One can also provide lower and/or upper bounds on the number of speakers using `min_speakers` and `max_speakers` options:
 ```python
+output = pipeline("audio.wav", min_speakers=2, max_speakers=5)
 ```
+## Exclusive speaker diarization
+`Community-1` pretrained pipeline returns a new *exclusive* speaker diarization, on top of the regular speaker diarization, available as `output.exclusive_speaker_diarization`.
+This is a feature which is [backported from our latest commercial model](https://www.pyannote.ai/blog/precision-2) that simplifies the reconciliation between fine-grained speaker diarization timestamps and (sometimes not so precise) transcription timestamps.
 ## Offline use
 1. In the terminal, copy the pipeline on disk:
 # when prompted for a password, use an access token with write permissions.
 # generate one from your settings: https://huggingface.co/settings/tokens
+git clone https://hf.co/pyannote/speaker-diarization-community-1 /path/to/directory/pyannote-speaker-diarization-community-1
 ```
 2. In Python, use the pipeline without internet connection:
 ```python
 # load pipeline from disk (works without internet connection)
 from pyannote.audio import Pipeline
+pipeline = Pipeline.from_pretrained('/path/to/directory/pyannote-speaker-diarization-community-1')
 # run the pipeline locally on your computer
+output = pipeline("audio.wav")
 ```
 ## Citations