hbredin commited on
Commit
669dc57
·
verified ·
1 Parent(s): 191d276

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -21
README.md CHANGED
@@ -38,14 +38,14 @@ This pipeline ingests mono audio sampled at 16kHz and outputs speaker diarizatio
38
 
39
  The main improvements brought by 4.0 over previous version 3.1 are
40
 
41
- - much better speaker counting and assignment
42
  - much easier [offline use](#offline-use) (i.e. without internet connection)
43
 
44
  ## Setup
45
 
46
  1. Accept user conditions
47
  2. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
48
- 3. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `4.x.x` with `pip install pyannote.audio`
49
 
50
  ## Quick start
51
 
@@ -65,6 +65,8 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
65
 
66
  ## Benchmark
67
 
 
 
68
  We report [diarization error rates](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %) on large collection of academic benchmarks.
69
 
70
  Processing is fully automated:
@@ -78,34 +80,42 @@ Processing is fully automated:
78
  - no forgiveness collar
79
  - evaluation of overlapped speech
80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
- | Benchmark | <a href="https://hf.co/pyannote/speaker-diarization-4.0"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/>OSS</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>API</a>
83
- | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | ------------------------------------------------------ |
84
- | [AISHELL-4](https://arxiv.org/abs/2104.03603) | 12.2 | 12.1
85
- | [AliMeeting](https://www.openslr.org/119/) (channel 1) | 24.5 | 19.8
86
- | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM)| 18.8 | 15.8
87
- | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM)| 22.7 | 18.3
88
- | [AVA-AVD](https://arxiv.org/abs/2111.14448)| 49.7 | 45.3
89
- | [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 28.4 | 20.1
90
- | [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 21.4 | 17.2
91
- | [Earnings21](https://github.com/revdotcom/speech-datasets) | 9.4 | 9.0
92
- | [MSDWild](https://github.com/X-LANCE/MSDWILD) | 25.4 | 19.7
93
- | [RAMC](https://www.openslr.org/123/) | 22.2 | 11.1
94
- | [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 7.9 | 7.6
95
- | [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 9.9
96
 
97
- The second column is [pyannoteAI premium](https://huggingface.co/pyannoteAI/speaker-diarization-precision) speaker diarization pipeline, as of April 2025. To test it:
98
 
99
- 1. Create a free [`pyannoteAI`](https://dashboard.pyannote.ai) account and get 150h of free credits.
100
- 1. Create an API key on [`pyannoteAI` dashboard](https://dashboard.pyannote.ai).
101
- 2. Enjoy [`pyannoteAI`](https://www.pyannote.ai) precision speaker diarization pipeline by changing one single line of code!
 
102
 
103
  ```diff
104
  from pyannote.audio import Pipeline
105
  pipeline = Pipeline.from_pretrained(
106
  - 'pyannote/speaker-diarization-4.0', token="{huggingface-token}")
107
  + 'pyannoteAI/speaker-diarization-precision', token="{pyannoteAI-api-key}")
108
- diarization = pipeline("audio.wav")
109
  ```
110
 
111
  ## Processing on GPU
 
38
 
39
  The main improvements brought by 4.0 over previous version 3.1 are
40
 
41
+ - much [better](#benchmark) speaker counting and assignment
42
  - much easier [offline use](#offline-use) (i.e. without internet connection)
43
 
44
  ## Setup
45
 
46
  1. Accept user conditions
47
  2. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
48
+ 3. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `>=4.0.0` with `pip install pyannote.audio`
49
 
50
  ## Quick start
51
 
 
65
 
66
  ## Benchmark
67
 
68
+ Out of the box, <img src="https://avatars.githubusercontent.com/u/7559051" width="20" style="vertical-align:text-bottom;" /> `pyannote.audio` speaker diarization pipeline v4.0 is expected to be much better than v3.1.
69
+
70
  We report [diarization error rates](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %) on large collection of academic benchmarks.
71
 
72
  Processing is fully automated:
 
80
  - no forgiveness collar
81
  - evaluation of overlapped speech
82
 
83
+ | Benchmark (last updated in 2025-08) | <a href="https://hf.co/pyannote/speaker-diarization-3.1"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/>v3.1</a> | <a href="https://hf.co/pyannote/speaker-diarization-4.0"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/> v4.0</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>API</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>labs</a> |
84
+ | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------------------------------------------| ------------------------------------------------ | --- |
85
+ | [AISHELL-4](https://arxiv.org/abs/2104.03603) | 12.2 | 11.7 | 11.8 | 11.4 |
86
+ | [AliMeeting](https://www.openslr.org/119/) (channel 1) | 24.5 | 20.3 | 16.3 | 15.2 |
87
+ | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) | 18.8 | 17.0 | 13.2 | 12.9 |
88
+ | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) | 22.7 | 19.9 | 15.8 | 15.6 |
89
+ | [AVA-AVD](https://arxiv.org/abs/2111.14448) | 49.7 | 44.6 | 40.7 | 37.1 |
90
+ | [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 28.5 | 26.7 | 17.6 | 16.6 |
91
+ | [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 21.4 | 20.2 | 15.7 | 14.7 |
92
+ | [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) | 51.2 | 46.8 | 44.7 | 39.0 |
93
+ | [MSDWild](https://github.com/X-LANCE/MSDWILD) | 25.4 | 22.8 | 17.9 | 17.3 |
94
+ | [RAMC](https://www.openslr.org/123/) | 22.2 | 20.8 | 10.6 | 10.5 |
95
+ | [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 7.9 | 8.9 | 7.3 | 7.4 |
96
+ | [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.2 | 9.0 | 8.5 |
97
+
98
+ __[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %, the lower, the better)__
99
+
100
+ | Benchmark (last updated in 2025-08) | <img src="https://avatars.githubusercontent.com/u/7559051" width="32" /> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /></a> | Speed up
101
+ | -------------- | ----------- | ----------- | ------ |
102
+ | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM), ~1h files | 31s per hour of audio | 14s per hour of audio | 2.2x faster
103
+ | [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)), ~5min files | 37s per hour of audio | 14s per hour of audio | 2.6x faster
104
 
105
+ __Processing speed on a NVIDIA H100 80GB HBM3__
 
 
 
 
 
 
 
 
 
 
 
 
 
106
 
 
107
 
108
+ <img src="https://avatars.githubusercontent.com/u/162698670" width="20" style="vertical-align:text-bottom;" /> `pyannoteAI` premium models are even better (and also 2x faster). <img src="https://avatars.githubusercontent.com/u/162698670" width="20" style="vertical-align:text-bottom;" /> `labs` model is currently in private beta.
109
+
110
+ 1. Create pyannoteAI API key at [`dashboard.pyannote.ai`](https://dashboard.pyannote.ai)
111
+ 2. Enjoy 150 hours of free credits by changing one single line of code!
112
 
113
  ```diff
114
  from pyannote.audio import Pipeline
115
  pipeline = Pipeline.from_pretrained(
116
  - 'pyannote/speaker-diarization-4.0', token="{huggingface-token}")
117
  + 'pyannoteAI/speaker-diarization-precision', token="{pyannoteAI-api-key}")
118
+ diarization = pipeline("audio.wav") # runs on pyannoteAI servers
119
  ```
120
 
121
  ## Processing on GPU