Update README.md
Browse files
README.md
CHANGED
|
@@ -67,18 +67,7 @@ for turn, _, speaker in diarization.itertracks(yield_label=True):
|
|
| 67 |
|
| 68 |
Out of the box, <img src="https://avatars.githubusercontent.com/u/7559051" width="20" style="vertical-align:text-bottom;" /> `pyannote.audio` speaker diarization pipeline v4.0 is expected to be much better than v3.1.
|
| 69 |
|
| 70 |
-
We report [diarization error rates](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %) on large collection of academic benchmarks.
|
| 71 |
-
|
| 72 |
-
Processing is fully automated:
|
| 73 |
-
|
| 74 |
-
- no manual voice activity detection (as is sometimes the case in the literature)
|
| 75 |
-
- no manual number of speakers (though it is possible to provide it to the pipeline)
|
| 76 |
-
- no fine-tuning of the internal models nor tuning of the pipeline hyper-parameters to each dataset
|
| 77 |
-
|
| 78 |
-
... with the least forgiving diarization error rate (DER) setup (named _"Full"_ in [this paper](https://doi.org/10.1016/j.csl.2021.101254)):
|
| 79 |
-
|
| 80 |
-
- no forgiveness collar
|
| 81 |
-
- evaluation of overlapped speech
|
| 82 |
|
| 83 |
| Benchmark (last updated in 2025-08) | <a href="https://hf.co/pyannote/speaker-diarization-3.1"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/>v3.1</a> | <a href="https://hf.co/pyannote/speaker-diarization-4.0"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/> v4.0</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>API</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>labs</a> |
|
| 84 |
| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------------------------------------------| ------------------------------------------------ | --- |
|
|
@@ -95,16 +84,13 @@ Processing is fully automated:
|
|
| 95 |
| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 7.9 | 8.9 | 7.3 | 7.4 |
|
| 96 |
| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.2 | 9.0 | 8.5 |
|
| 97 |
|
| 98 |
-
|
| 99 |
|
| 100 |
| Benchmark (last updated in 2025-08) | <img src="https://avatars.githubusercontent.com/u/7559051" width="32" /> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /></a> | Speed up
|
| 101 |
| -------------- | ----------- | ----------- | ------ |
|
| 102 |
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM), ~1h files | 31s per hour of audio | 14s per hour of audio | 2.2x faster
|
| 103 |
| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)), ~5min files | 37s per hour of audio | 14s per hour of audio | 2.6x faster
|
| 104 |
|
| 105 |
-
__Processing speed on a NVIDIA H100 80GB HBM3__
|
| 106 |
-
|
| 107 |
-
|
| 108 |
<img src="https://avatars.githubusercontent.com/u/162698670" width="20" style="vertical-align:text-bottom;" /> `pyannoteAI` premium models are even better (and also 2x faster). <img src="https://avatars.githubusercontent.com/u/162698670" width="20" style="vertical-align:text-bottom;" /> `labs` model is currently in private beta.
|
| 109 |
|
| 110 |
1. Create pyannoteAI API key at [`dashboard.pyannote.ai`](https://dashboard.pyannote.ai)
|
|
@@ -226,3 +212,7 @@ diarization = pipeline("audio.wav")
|
|
| 226 |
}
|
| 227 |
```
|
| 228 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
Out of the box, <img src="https://avatars.githubusercontent.com/u/7559051" width="20" style="vertical-align:text-bottom;" /> `pyannote.audio` speaker diarization pipeline v4.0 is expected to be much better than v3.1.
|
| 69 |
|
| 70 |
+
We report [diarization error rates](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %) on large collection of academic benchmarks (fully automatic processing, no forgiveness collar, nor skipping overlapping speech).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
| Benchmark (last updated in 2025-08) | <a href="https://hf.co/pyannote/speaker-diarization-3.1"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/>v3.1</a> | <a href="https://hf.co/pyannote/speaker-diarization-4.0"><img src="https://avatars.githubusercontent.com/u/7559051" width="32" /><br/> v4.0</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>API</a> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /><br/>labs</a> |
|
| 73 |
| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------------------------------------------| ------------------------------------------------ | --- |
|
|
|
|
| 84 |
| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 7.9 | 8.9 | 7.3 | 7.4 |
|
| 85 |
| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.2 | 9.0 | 8.5 |
|
| 86 |
|
| 87 |
+
We also report processing speed on a NVIDIA H100 80GB HBM3:
|
| 88 |
|
| 89 |
| Benchmark (last updated in 2025-08) | <img src="https://avatars.githubusercontent.com/u/7559051" width="32" /> | <a href="https://docs.pyannote.ai"><img src="https://avatars.githubusercontent.com/u/162698670" width="32" /></a> | Speed up
|
| 90 |
| -------------- | ----------- | ----------- | ------ |
|
| 91 |
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM), ~1h files | 31s per hour of audio | 14s per hour of audio | 2.2x faster
|
| 92 |
| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)), ~5min files | 37s per hour of audio | 14s per hour of audio | 2.6x faster
|
| 93 |
|
|
|
|
|
|
|
|
|
|
| 94 |
<img src="https://avatars.githubusercontent.com/u/162698670" width="20" style="vertical-align:text-bottom;" /> `pyannoteAI` premium models are even better (and also 2x faster). <img src="https://avatars.githubusercontent.com/u/162698670" width="20" style="vertical-align:text-bottom;" /> `labs` model is currently in private beta.
|
| 95 |
|
| 96 |
1. Create pyannoteAI API key at [`dashboard.pyannote.ai`](https://dashboard.pyannote.ai)
|
|
|
|
| 212 |
}
|
| 213 |
```
|
| 214 |
|
| 215 |
+
## Acknowledgment
|
| 216 |
+
|
| 217 |
+
Training and tuning made possible thanks to [GENCI](https://www.genci.fr/) on the [**Jean Zay**](http://www.idris.fr/eng/jean-zay/) supercomputer.
|
| 218 |
+
|