Spaces:
Running
Running
next-release
#1
by
hbredin
- opened
- README.md +92 -29
- pyannoteAI.png +0 -0
README.md
CHANGED
|
@@ -7,32 +7,95 @@ sdk: static
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
[
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
[
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
+

|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
## πΒ Simply detect, segment, label, and separate speakers in any language
|
| 14 |
+
|
| 15 |
+
[π `pyannoteAI` playground](https://dashboard.pyannote.ai/) // [π `pyannoteAI` documentation](https://docs.pyannote.ai/) // [πΉ `pyannote` open-source toolkit](https://github.com/pyannote/pyannote-audio) // [π€ `pyannote` pretrained models](https://huggingface.co/pyannote) //  
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
### π€ What is speaker diarization?
|
| 19 |
+
|
| 20 |
+

|
| 21 |
+
|
| 22 |
+
**Speaker diarization** is the process of automatically partitioning the audio recording of a conversation into segments and labeling them by speaker, answering the question **"who spoke when?"**. As the **foundational layer of conversational AI**, speaker diarization provides high-level insights for human-human and human-machine conversations, and unlocks a wide range of downstream applications: meeting transcription, call center analytics, voice agents, video dubbing.
|
| 23 |
+
|
| 24 |
+
### βΆοΈ Getting started
|
| 25 |
+
|
| 26 |
+
Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) latest release available from  with either `uv` (recommended) or `pip`:
|
| 27 |
+
|
| 28 |
+
```bash
|
| 29 |
+
$ uv add pyannote.audio
|
| 30 |
+
$ pip install pyannote.audio
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
Enjoy state-of-the-art speaker diarization:
|
| 34 |
+
|
| 35 |
+
```python
|
| 36 |
+
# download pretrained pipeline from Huggingface
|
| 37 |
+
from pyannote.audio import Pipeline
|
| 38 |
+
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-community-1', token="HUGGINGFACE_TOKEN")
|
| 39 |
+
|
| 40 |
+
# perform speaker diarization locally
|
| 41 |
+
output = pipeline('/path/to/audio.wav')
|
| 42 |
+
|
| 43 |
+
# enjoy state-of-the-art speaker diarization
|
| 44 |
+
for turn, speaker in output.speaker_diarization:
|
| 45 |
+
print(f"{speaker} speaks between t={turn.start}s and t={turn.end}s")
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
Read [`community-1` model card](https://hf.co/pyannote/speaker-diarization-community-1) to make the most of it.
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
### π State-of-the-art models
|
| 52 |
+
|
| 53 |
+
[`pyannoteAI`](https://www.pyannote.ai/) research team trains cutting-edge speaker diarization models, thanks to [**Jean Zay**](http://www.idris.fr/eng/jean-zay/) π«π· supercomputer managed by [**GENCI**](https://www.genci.fr/) π. They come in two flavors:
|
| 54 |
+
|
| 55 |
+
* [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) open models available on [Huggingface](https://hf.co/pyannote) and used by 140k+ developers over the world ;
|
| 56 |
+
* premium models available on [`pyannoteAI` cloud](https://dashboard.pyannote.ai) (and on-premise for enterprise customers) that provide state-of-the-art speaker diarization as well as additional enterprise features.
|
| 57 |
+
|
| 58 |
+
| Benchmark (last updated in 2025-09) | <a href="https://hf.co/pyannote/speaker-diarization-3.1">`legacy` (3.1)</a>| <a href="https://hf.co/pyannote/speaker-diarization-community-1">`community-1`</a> | <a href="https://docs.pyannote.ai">`precision-2`</a> |
|
| 59 |
+
| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------------------------------------------| ------------------------------------------------ |
|
| 60 |
+
| [AISHELL-4](https://arxiv.org/abs/2104.03603) | 12.2 | 11.7 | 11.4 π |
|
| 61 |
+
| [AliMeeting](https://www.openslr.org/119/) (channel 1) | 24.5 | 20.3 | 15.2 π|
|
| 62 |
+
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) | 18.8 | 17.0 | 12.9 π|
|
| 63 |
+
| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) | 22.7 | 19.9 | 15.6 π |
|
| 64 |
+
| [AVA-AVD](https://arxiv.org/abs/2111.14448) | 49.7 | 44.6 | 37.1 π |
|
| 65 |
+
| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 28.5 | 26.7 | 16.6 π |
|
| 66 |
+
| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 21.4 | 20.2 | 14.7 π |
|
| 67 |
+
| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) | 51.2 | 46.8 | 39.0 π |
|
| 68 |
+
| [MSDWild](https://github.com/X-LANCE/MSDWILD) | 25.4 | 22.8 | 17.3 π |
|
| 69 |
+
| [RAMC](https://www.openslr.org/123/) | 22.2 | 20.8 | 10.5 π |
|
| 70 |
+
| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 7.9 | 8.9 | 7.4 π |
|
| 71 |
+
| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.2 | 8.5 π |
|
| 72 |
+
|
| 73 |
+
__[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %, the lower, the better)__
|
| 74 |
+
|
| 75 |
+
### β©οΈ Going further, better, and faster
|
| 76 |
+
|
| 77 |
+
[`precision-2`](https://www.pyannote.ai/blog/precision-2) premium pipeline further improves accuracy, processing speed, as well as brings additional features.
|
| 78 |
+
|
| 79 |
+
| Features | <a href="https://hf.co/pyannote/speaker-diarization-community-1">`community-1`</a> | <a href="https://docs.pyannote.ai">`precision-2`</a> |
|
| 80 |
+
| -------------- | ----------- | ----------- |
|
| 81 |
+
| Set exact/min/max number of speakers | β
| β
|
|
| 82 |
+
| Exclusive speaker diarization (for transcription) | β
| β
|
|
| 83 |
+
| Segmentation confidence scores | β | β
|
|
| 84 |
+
| Speaker confidence scores | β | β
|
|
| 85 |
+
| Voiceprinting | β | β
|
|
| 86 |
+
| Speaker identification | β | β
|
|
| 87 |
+
| Time to process 1h of audio (on H100) | 37s | 14s |
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
Create a [`pyannoteAI`](https://dashboard.pyannote.ai) account, change one line of code, and enjoy free cloud credits to try [`precision-2`](https://pyannote.ai/blog/precision-2) premium diarization:
|
| 91 |
+
|
| 92 |
+
```python
|
| 93 |
+
# perform premium speaker diarization on pyannoteAI cloud
|
| 94 |
+
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-precision-2', token="PYANNOTEAI_API_KEY")
|
| 95 |
+
better_output = pipeline('/path/to/audio.wav')
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
### π Join the community
|
| 99 |
+
|
| 100 |
+
[Discord](https://discord.gg/4cjCJcZv) // [X](https://x.com/pyannoteAI) // [LinkedIn](https://www.linkedin.com/company/pyannoteai/) // [Huggingface](https://hf.co/pyannote) // [Github](https://github.com/pyannote)
|
| 101 |
+
|
pyannoteAI.png
DELETED
|
Binary file (3.44 kB)
|
|
|