Spaces:

pyannote
/

README

Running

App Files Files Community

next-release

by hbredin - opened Aug 15, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+92

-29

Files changed (2) hide show

README.md +92 -29
pyannoteAI.png +0 -0

README.md CHANGED Viewed

@@ -7,32 +7,95 @@ sdk: static
 pinned: false
 ---
-[**pyannote.audio**](https://github.com/pyannote/pyannote-audio) is an open-source toolkit for speaker diarization.
-Pretrained pipelines reach state-of-the-art performance on most academic benchmarks.
-Training is made possible thanks to [GENCI](https://www.genci.fr/) on the [**Jean Zay**](http://www.idris.fr/eng/jean-zay/) supercomputer.
-[**pyannoteAI**](https://www.pyannote.ai) provides even better and faster enterprise options, which can be tried for free on our [**playground**](https://dashboard.pyannote.ai).
-| Benchmark              | [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) | [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) | [pyannoteAI](https://www.pyannote.ai) |
-| ---------------------- | ------ | ------ | --------- |
-| [AISHELL-4](https://arxiv.org/abs/2104.03603)              |  14.1  |  12.2  | 11.9      |
-| [AliMeeting](https://www.openslr.org/119/) (channel 1) |  27.4  |  24.4  | 16.6      |
-| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM)              |  18.9  |  18.8  | 13.2      |
-| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM)              |  27.1  |  22.4  | 15.8      |
-| [AVA-AVD](https://arxiv.org/abs/2111.14448)                |  66.3  |  50.0  | 39.9      |
-| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1))      |  31.6  |  28.4  | 17.8      |
-| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477))        |  26.9  |  21.7  | 15.7      |
-| [Earnings21](https://github.com/revdotcom/speech-datasets)   | 17.0 | 9.4 | 9.1 |
-| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.)           |  61.5  |  51.2  | 42.8      |
-| [MSDWild](https://github.com/X-LANCE/MSDWILD)                |  32.8  |  25.3  | 17.7      |
-| [RAMC](https://www.openslr.org/123/)                   |  22.5  |  22.2  | 10.6      |
-| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2)        |   8.2  |   7.8  |  7.3      |
-| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3)     |  11.2  |  11.3  |  8.9      |
-[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %)
-Using high-end NVIDIA hardware,
-* [v2.1](https://hf.co/pyannote/speaker-diarization-2.1) takes around 1m30s to process 1h of audio
-* [v3.1](https://hf.co/pyannote/speaker-diarization-3.1) takes around 1m20s to process 1h of audio
-* On-premise [pyannoteAI](https://www.pyannote.ai) takes less than 20s to process 1h of audio

 pinned: false
 ---
+![Identify who speaks when with pyannote](https://github.com/pyannote/.github/raw/main/profile/banner.jpg)
+## 💚 Simply detect, segment, label, and separate speakers in any language
+[🎈 `pyannoteAI` playground](https://dashboard.pyannote.ai/) // [📚 `pyannoteAI` documentation](https://docs.pyannote.ai/) // [🎹 `pyannote` open-source toolkit](https://github.com/pyannote/pyannote-audio) // [🤗 `pyannote` pretrained models](https://huggingface.co/pyannote) // ![Github stars](https://img.shields.io/github/stars/pyannote/pyannote-audio?color=g) ![PyPI Downloads](https://static.pepy.tech/personalized-badge/pyannote-audio?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=downloads)
+### 🎤 What is speaker diarization?
+![Diarization](https://github.com/pyannote/.github/raw/main/profile/diarization.jpg)
+**Speaker diarization** is the process of automatically partitioning the audio recording of a conversation into segments and labeling them by speaker, answering the question **"who spoke when?"**. As the **foundational layer of conversational AI**, speaker diarization provides high-level insights for human-human and human-machine conversations, and unlocks a wide range of downstream applications: meeting transcription, call center analytics, voice agents, video dubbing.
+### ▶️ Getting started
+Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) latest release available from ![Latest release](https://img.shields.io/pypi/v/pyannote-audio?color=059669) with either `uv` (recommended) or `pip`:
+```bash
+$ uv add pyannote.audio
+$ pip install pyannote.audio
+```
+Enjoy state-of-the-art speaker diarization:
+```python
+# download pretrained pipeline from Huggingface
+from pyannote.audio import Pipeline
+pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-community-1', token="HUGGINGFACE_TOKEN")
+# perform speaker diarization locally
+output = pipeline('/path/to/audio.wav')
+# enjoy state-of-the-art speaker diarization
+for turn, speaker in output.speaker_diarization:
+    print(f"{speaker} speaks between t={turn.start}s and t={turn.end}s")
+```
+Read [`community-1` model card](https://hf.co/pyannote/speaker-diarization-community-1) to make the most of it.
+### 🏆 State-of-the-art models
+[`pyannoteAI`](https://www.pyannote.ai/) research team trains cutting-edge speaker diarization models, thanks to [**Jean Zay**](http://www.idris.fr/eng/jean-zay/) 🇫🇷 supercomputer managed by [**GENCI**](https://www.genci.fr/) 💚. They come in two flavors:
+* [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) open models available on [Huggingface](https://hf.co/pyannote) and used by 140k+ developers over the world ;
+* premium models available on [`pyannoteAI` cloud](https://dashboard.pyannote.ai) (and on-premise for enterprise customers) that provide state-of-the-art speaker diarization as well as additional enterprise features.
+| Benchmark (last updated in 2025-09) | <a href="https://hf.co/pyannote/speaker-diarization-3.1">`legacy` (3.1)</a>| <a href="https://hf.co/pyannote/speaker-diarization-community-1">`community-1`</a> | <a href="https://docs.pyannote.ai">`precision-2`</a> |
+| --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------ | -------------------------------------------------| ------------------------------------------------ |
+| [AISHELL-4](https://arxiv.org/abs/2104.03603)                                                                               | 12.2 | 11.7 | 11.4 🏆 |
+| [AliMeeting](https://www.openslr.org/119/) (channel 1)                                                                      | 24.5 | 20.3 | 15.2 🏆|
+| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (IHM)                                                                        | 18.8 | 17.0 | 12.9 🏆|
+| [AMI](https://groups.inf.ed.ac.uk/ami/corpus/) (SDM)                                                                        | 22.7 | 19.9 | 15.6 🏆 |
+| [AVA-AVD](https://arxiv.org/abs/2111.14448)                                                                                 | 49.7 | 44.6 | 37.1 🏆 |
+| [CALLHOME](https://catalog.ldc.upenn.edu/LDC2001S97) ([part 2](https://github.com/BUTSpeechFIT/CALLHOME_sublists/issues/1)) | 28.5 | 26.7 | 16.6 🏆 |
+| [DIHARD 3](https://catalog.ldc.upenn.edu/LDC2022S14) ([full](https://arxiv.org/abs/2012.01477))                             | 21.4 | 20.2 | 14.7 🏆 |
+| [Ego4D](https://arxiv.org/abs/2110.07058) (dev.)                                                                            | 51.2 | 46.8 | 39.0 🏆 |
+| [MSDWild](https://github.com/X-LANCE/MSDWILD)                                                                               | 25.4 | 22.8 | 17.3 🏆 |
+| [RAMC](https://www.openslr.org/123/)                                                                                        | 22.2 | 20.8 | 10.5 🏆 |
+| [REPERE](https://www.islrn.org/resources/360-758-359-485-0/) (phase2)                                                       | 7.9  |  8.9 |  7.4 🏆 |
+| [VoxConverse](https://github.com/joonson/voxconverse) (v0.3)                                                                | 11.2 | 11.2 |  8.5 🏆 |
+__[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %, the lower, the better)__
+### ⏩️ Going further, better, and faster
+[`precision-2`](https://www.pyannote.ai/blog/precision-2) premium pipeline further improves accuracy, processing speed, as well as brings additional features.
+| Features | <a href="https://hf.co/pyannote/speaker-diarization-community-1">`community-1`</a> | <a href="https://docs.pyannote.ai">`precision-2`</a> |
+| -------------- | ----------- | ----------- |
+| Set exact/min/max number of speakers | ✅ | ✅ |
+| Exclusive speaker diarization (for transcription) | ✅ | ✅ |
+| Segmentation confidence scores | ❌ | ✅ |
+| Speaker confidence scores | ❌ | ✅ |
+| Voiceprinting | ❌ | ✅ |
+| Speaker identification | ❌ | ✅ |
+| Time to process 1h of audio (on H100) | 37s | 14s |
+Create a [`pyannoteAI`](https://dashboard.pyannote.ai) account, change one line of code, and enjoy free cloud credits to try [`precision-2`](https://pyannote.ai/blog/precision-2) premium diarization:
+```python
+# perform premium speaker diarization on pyannoteAI cloud
+pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-precision-2', token="PYANNOTEAI_API_KEY")
+better_output = pipeline('/path/to/audio.wav')
+```
+### 🎉 Join the community
+[Discord](https://discord.gg/4cjCJcZv) // [X](https://x.com/pyannoteAI) // [LinkedIn](https://www.linkedin.com/company/pyannoteai/) // [Huggingface](https://hf.co/pyannote) // [Github](https://github.com/pyannote)

pyannoteAI.png DELETED Viewed

Binary file (3.44 kB)