Files changed (2) hide show
  1. README.md +92 -29
  2. pyannoteAI.png +0 -0
README.md CHANGED
@@ -7,32 +7,95 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- [**pyannote.audio**](https://github.com/pyannote/pyannote-audio) is an open-source toolkit for speaker diarization.
11
-
12
- Pretrained pipelines reach state-of-the-art performance on most academic benchmarks.
13
- Training is made possible thanks to [GENCI](https://www.genci.fr/) on the [**Jean Zay**](http://www.idris.fr/eng/jean-zay/) supercomputer.
14
-
15
- [**pyannoteAI**](https://www.pyannote.ai) provides even better and faster enterprise options, which can be tried for free on our [**playground**](https://dashboard.pyannote.ai).
16
-
17
-
18
- | Benchmark | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [pyannoteAI](https://www.pyannote.ai) |
19
- | ---------------------- | ------ | ------ | --------- |
20
- | [AISHELL-4](https://arxiv.org/abs/2104.03603) | 14.1 | 12.2 | 11.9 |
21
- | [AliMeeting](https://www.openslr.org/119/) (channel 1) | 27.4 | 24.4 | 16.6 |
22
- | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) | 18.9 | 18.8 | 13.2 |
23
- | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) | 27.1 | 22.4 | 15.8 |
24
- | [AVA-AVD](https://arxiv.org/abs/2111.14448) | 66.3 | 50.0 | 39.9 |
25
- | [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 31.6 | 28.4 | 17.8 |
26
- | [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 26.9 | 21.7 | 15.7 |
27
- | [Earnings21](https://github.com/revdotcom/speech-datasets) | 17.0 | 9.4 | 9.1 |
28
- | [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) | 61.5 | 51.2 | 42.8 |
29
- | [MSDWild](https://github.com/X-LANCE/MSDWILD) | 32.8 | 25.3 | 17.7 |
30
- | [RAMC](https://www.openslr.org/123/) | 22.5 | 22.2 | 10.6 |
31
- | [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 8.2 | 7.8 | 7.3 |
32
- | [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.3 | 8.9 |
33
- [Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)
34
-
35
- Using high-end NVIDIA hardware,
36
- * [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) takes around 1m30s to process 1h of audio
37
- * [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) takes around 1m20s to process 1h of audio
38
- * On-premise [pyannoteAI](https://www.pyannote.ai) takes less than 20s to process 1h of audio
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ ![Identify who speaks when with pyannote](https://github.com/pyannote/.github/raw/main/profile/banner.jpg)
11
+
12
+
13
+ ## πŸ’šΒ Simply detect, segment, label, and separate speakers in any language
14
+
15
+ [🎈 `pyannoteAI` playground](https://dashboard.pyannote.ai/) // [πŸ“š `pyannoteAI` documentation](https://docs.pyannote.ai/) // [🎹 `pyannote` open-source toolkit](https://github.com/pyannote/pyannote-audio) // [πŸ€— `pyannote` pretrained models](https://huggingface.co/pyannote) // ![Github stars](https://img.shields.io/github/stars/pyannote/pyannote-audio?color=g) ![PyPI Downloads](https://static.pepy.tech/personalized-badge/pyannote-audio?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=downloads)
16
+
17
+
18
+ ### 🎀 What is speaker diarization?
19
+
20
+ ![Diarization](https://github.com/pyannote/.github/raw/main/profile/diarization.jpg)
21
+
22
+ **Speaker diarization** is the process of automatically partitioning the audio recording of a conversation into segments and labeling them by speaker, answering the question **"who spoke when?"**. As the **foundational layer of conversational AI**, speaker diarization provides high-level insights for human-human and human-machine conversations, and unlocks a wide range of downstream applications: meeting transcription, call center analytics, voice agents, video dubbing.
23
+
24
+ ### ▢️ Getting started
25
+
26
+ Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) latest release available from ![Latest release](https://img.shields.io/pypi/v/pyannote-audio?color=059669) with either `uv` (recommended) or `pip`:
27
+
28
+ ```bash
29
+ $ uv add pyannote.audio
30
+ $ pip install pyannote.audio
31
+ ```
32
+
33
+ Enjoy state-of-the-art speaker diarization:
34
+
35
+ ```python
36
+ # download pretrained pipeline from Huggingface
37
+ from pyannote.audio import Pipeline
38
+ pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-community-1', token="HUGGINGFACE_TOKEN")
39
+
40
+ # perform speaker diarization locally
41
+ output = pipeline('/path/to/audio.wav')
42
+
43
+ # enjoy state-of-the-art speaker diarization
44
+ for turn, speaker in output.speaker_diarization:
45
+ print(f"{speaker} speaks between t={turn.start}s and t={turn.end}s")
46
+ ```
47
+
48
+ Read [`community-1` model card](https://hf.co/pyannote/speaker-diarization-community-1) to make the most of it.
49
+
50
+
51
+ ### πŸ† State-of-the-art models
52
+
53
+ [`pyannoteAI`](https://www.pyannote.ai/) research team trains cutting-edge speaker diarization models, thanks to [**Jean Zay**](http://www.idris.fr/eng/jean-zay/) πŸ‡«πŸ‡· supercomputer managed by [**GENCI**](https://www.genci.fr/) πŸ’š. They come in two flavors:
54
+
55
+ * [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) open models available on [Huggingface](https://hf.co/pyannote) and used by 140k+ developers over the world ;
56
+ * premium models available on [`pyannoteAI` cloud](https://dashboard.pyannote.ai) (and on-premise for enterprise customers) that provide state-of-the-art speaker diarization as well as additional enterprise features.
57
+
58
+ | Benchmark (last updated in 2025-09) | <a href="https://hf.co/pyannote/speaker-diarization-3.1">`legacy` (3.1)</a>| <a href="https://hf.co/pyannote/speaker-diarization-community-1">`community-1`</a> | <a href="https://docs.pyannote.ai">`precision-2`</a> |
59
+ | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------------------------------------------| ------------------------------------------------ |
60
+ | [AISHELL-4](https://arxiv.org/abs/2104.03603) | 12.2 | 11.7 | 11.4 πŸ† |
61
+ | [AliMeeting](https://www.openslr.org/119/) (channel 1) | 24.5 | 20.3 | 15.2 πŸ†|
62
+ | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM) | 18.8 | 17.0 | 12.9 πŸ†|
63
+ | [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM) | 22.7 | 19.9 | 15.6 πŸ† |
64
+ | [AVA-AVD](https://arxiv.org/abs/2111.14448) | 49.7 | 44.6 | 37.1 πŸ† |
65
+ | [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 28.5 | 26.7 | 16.6 πŸ† |
66
+ | [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477)) | 21.4 | 20.2 | 14.7 πŸ† |
67
+ | [Ego4D](https://arxiv.org/abs/2110.07058) (dev.) | 51.2 | 46.8 | 39.0 πŸ† |
68
+ | [MSDWild](https://github.com/X-LANCE/MSDWILD) | 25.4 | 22.8 | 17.3 πŸ† |
69
+ | [RAMC](https://www.openslr.org/123/) | 22.2 | 20.8 | 10.5 πŸ† |
70
+ | [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2) | 7.9 | 8.9 | 7.4 πŸ† |
71
+ | [VoxConverse](https://github.com/joonson/voxconverse) (v0.3) | 11.2 | 11.2 | 8.5 πŸ† |
72
+
73
+ __[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %, the lower, the better)__
74
+
75
+ ### ⏩️ Going further, better, and faster
76
+
77
+ [`precision-2`](https://www.pyannote.ai/blog/precision-2) premium pipeline further improves accuracy, processing speed, as well as brings additional features.
78
+
79
+ | Features | <a href="https://hf.co/pyannote/speaker-diarization-community-1">`community-1`</a> | <a href="https://docs.pyannote.ai">`precision-2`</a> |
80
+ | -------------- | ----------- | ----------- |
81
+ | Set exact/min/max number of speakers | βœ… | βœ… |
82
+ | Exclusive speaker diarization (for transcription) | βœ… | βœ… |
83
+ | Segmentation confidence scores | ❌ | βœ… |
84
+ | Speaker confidence scores | ❌ | βœ… |
85
+ | Voiceprinting | ❌ | βœ… |
86
+ | Speaker identification | ❌ | βœ… |
87
+ | Time to process 1h of audio (on H100) | 37s | 14s |
88
+
89
+
90
+ Create a [`pyannoteAI`](https://dashboard.pyannote.ai) account, change one line of code, and enjoy free cloud credits to try [`precision-2`](https://pyannote.ai/blog/precision-2) premium diarization:
91
+
92
+ ```python
93
+ # perform premium speaker diarization on pyannoteAI cloud
94
+ pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-precision-2', token="PYANNOTEAI_API_KEY")
95
+ better_output = pipeline('/path/to/audio.wav')
96
+ ```
97
+
98
+ ### πŸŽ‰ Join the community
99
+
100
+ [Discord](https://discord.gg/4cjCJcZv) // [X](https://x.com/pyannoteAI) // [LinkedIn](https://www.linkedin.com/company/pyannoteai/) // [Huggingface](https://hf.co/pyannote) // [Github](https://github.com/pyannote)
101
+
pyannoteAI.png DELETED
Binary file (3.44 kB)