Update README.md
Browse files
README.md
CHANGED
|
@@ -38,14 +38,14 @@ This pipeline ingests mono audio sampled at 16kHz and outputs speaker diarizatio
|
|
| 38 |
|
| 39 |
The main improvements brought by 4.0 over previous version 3.1 are
|
| 40 |
|
| 41 |
-
- much better speaker counting and assignment
|
| 42 |
- much easier [offline use](#offline-use) (i.e. without internet connection)
|
| 43 |
|
| 44 |
## Setup
|
| 45 |
|
| 46 |
1. Accept user conditions
|
| 47 |
2. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
|
| 48 |
-
3. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio)
|
| 49 |
|
| 50 |
## Quick start
|
| 51 |
|
|
@@ -65,6 +65,8 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
|
|
| 65 |
|
| 66 |
## Benchmark
|
| 67 |
|
|
|
|
|
|
|
| 68 |
We report [diarization error rates](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %) on large collection of academic benchmarks.
|
| 69 |
|
| 70 |
Processing is fully automated:
|
|
@@ -78,34 +80,42 @@ Processing is fully automated:
|
|
| 78 |
- no forgiveness collar
|
| 79 |
- evaluation of overlapped speech
|
| 80 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
-
|
| 83 |
-
| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ |
|
| 84 |
-
| [AISHELL-4](https://arxiv.org/abs/2104.03603) | 12.2 | 12.1
|
| 85 |
-
| [AliMeeting](https://www.openslr.org/119/) (channel 1) | 24.5 | 19.8
|
| 86 |
-
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM)| 18.8 | 15.8
|
| 87 |
-
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM)| 22.7 | 18.3
|
| 88 |
-
| [AVA-AVD](https://arxiv.org/abs/2111.14448)| 49.7 | 45.3
|
| 89 |
-
| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 28.4 | 20.1
|
| 90 |
-
| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 21.4 | 17.2
|
| 91 |
-
| [Earnings21](https://github.com/revdotcom/speech-datasets) | 9.4 | 9.0
|
| 92 |
-
| [MSDWild](https://github.com/X-LANCE/MSDWILD) | 25.4 | 19.7
|
| 93 |
-
| [RAMC](https://www.openslr.org/123/) | 22.2 | 11.1
|
| 94 |
-
| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 7.9 | 7.6
|
| 95 |
-
| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 9.9
|
| 96 |
|
| 97 |
-
The second column is [pyannoteAI premium](https://huggingface.co/pyannoteAI/speaker-diarization-precision) speaker diarization pipeline, as of April 2025. To test it:
|
| 98 |
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
|
|
|
| 102 |
|
| 103 |
```diff
|
| 104 |
from pyannote.audio import Pipeline
|
| 105 |
pipeline = Pipeline.from_pretrained(
|
| 106 |
- 'pyannote/speaker-diarization-4.0', token="{huggingface-token}")
|
| 107 |
+ 'pyannoteAI/speaker-diarization-precision', token="{pyannoteAI-api-key}")
|
| 108 |
-
diarization = pipeline("audio.wav")
|
| 109 |
```
|
| 110 |
|
| 111 |
## Processing on GPU
|
|
|
|
| 38 |
|
| 39 |
The main improvements brought by 4.0 over previous version 3.1 are
|
| 40 |
|
| 41 |
+
- much [better](#benchmark) speaker counting and assignment
|
| 42 |
- much easier [offline use](#offline-use) (i.e. without internet connection)
|
| 43 |
|
| 44 |
## Setup
|
| 45 |
|
| 46 |
1. Accept user conditions
|
| 47 |
2. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
|
| 48 |
+
3. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `>=4.0.0` with `pip install pyannote.audio`
|
| 49 |
|
| 50 |
## Quick start
|
| 51 |
|
|
|
|
| 65 |
|
| 66 |
## Benchmark
|
| 67 |
|
| 68 |
+
Out of the box, <img src="https://avatars.githubusercontent.com/u/7559051" width="20" style="vertical-align:text-bottom;" /> `pyannote.audio` speaker diarization pipeline v4.0 is expected to be much better than v3.1.
|
| 69 |
+
|
| 70 |
We report [diarization error rates](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %) on large collection of academic benchmarks.
|
| 71 |
|
| 72 |
Processing is fully automated:
|
|
|
|
| 80 |
- no forgiveness collar
|
| 81 |
- evaluation of overlapped speech
|
| 82 |
|
| 83 |
+
| Benchmark (last updated in 2025-08) | <a href="https://hf.co/pyannote/speaker-diarization-3.1"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/>v3.1</a> | <a href="https://hf.co/pyannote/speaker-diarization-4.0"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/> v4.0</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>API</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>labs</a> |
|
| 84 |
+
| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------------------------------------------| ------------------------------------------------ | --- |
|
| 85 |
+
| [AISHELL-4](https://arxiv.org/abs/2104.03603) | 12.2 | 11.7 | 11.8 | 11.4 |
|
| 86 |
+
| [AliMeeting](https://www.openslr.org/119/) (channel 1) | 24.5 | 20.3 | 16.3 | 15.2 |
|
| 87 |
+
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) | 18.8 | 17.0 | 13.2 | 12.9 |
|
| 88 |
+
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) | 22.7 | 19.9 | 15.8 | 15.6 |
|
| 89 |
+
| [AVA-AVD](https://arxiv.org/abs/2111.14448) | 49.7 | 44.6 | 40.7 | 37.1 |
|
| 90 |
+
| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 28.5 | 26.7 | 17.6 | 16.6 |
|
| 91 |
+
| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 21.4 | 20.2 | 15.7 | 14.7 |
|
| 92 |
+
| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) | 51.2 | 46.8 | 44.7 | 39.0 |
|
| 93 |
+
| [MSDWild](https://github.com/X-LANCE/MSDWILD) | 25.4 | 22.8 | 17.9 | 17.3 |
|
| 94 |
+
| [RAMC](https://www.openslr.org/123/) | 22.2 | 20.8 | 10.6 | 10.5 |
|
| 95 |
+
| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 7.9 | 8.9 | 7.3 | 7.4 |
|
| 96 |
+
| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.2 | 9.0 | 8.5 |
|
| 97 |
+
|
| 98 |
+
__[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %, the lower, the better)__
|
| 99 |
+
|
| 100 |
+
| Benchmark (last updated in 2025-08) | <img src="https://avatars.githubusercontent.com/u/7559051" width="32" /> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /></a> | Speed up
|
| 101 |
+
| -------------- | ----------- | ----------- | ------ |
|
| 102 |
+
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM), ~1h files | 31s per hour of audio | 14s per hour of audio | 2.2x faster
|
| 103 |
+
| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)), ~5min files | 37s per hour of audio | 14s per hour of audio | 2.6x faster
|
| 104 |
|
| 105 |
+
__Processing speed on a NVIDIA H100 80GB HBM3__
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 106 |
|
|
|
|
| 107 |
|
| 108 |
+
<img src="https://avatars.githubusercontent.com/u/162698670" width="20" style="vertical-align:text-bottom;" /> `pyannoteAI` premium models are even better (and also 2x faster). <img src="https://avatars.githubusercontent.com/u/162698670" width="20" style="vertical-align:text-bottom;" /> `labs` model is currently in private beta.
|
| 109 |
+
|
| 110 |
+
1. Create pyannoteAI API key at [`dashboard.pyannote.ai`](https://dashboard.pyannote.ai)
|
| 111 |
+
2. Enjoy 150 hours of free credits by changing one single line of code!
|
| 112 |
|
| 113 |
```diff
|
| 114 |
from pyannote.audio import Pipeline
|
| 115 |
pipeline = Pipeline.from_pretrained(
|
| 116 |
- 'pyannote/speaker-diarization-4.0', token="{huggingface-token}")
|
| 117 |
+ 'pyannoteAI/speaker-diarization-precision', token="{pyannoteAI-api-key}")
|
| 118 |
+
diarization = pipeline("audio.wav") # runs on pyannoteAI servers
|
| 119 |
```
|
| 120 |
|
| 121 |
## Processing on GPU
|