hbredin commited on
Commit
191d276
·
verified ·
1 Parent(s): 35a231a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -9
README.md CHANGED
@@ -36,13 +36,16 @@ This pipeline ingests mono audio sampled at 16kHz and outputs speaker diarizatio
36
  - stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels.
37
  - audio files sampled at a different rate are resampled to 16kHz automatically upon loading.
38
 
39
- The main improvement over previous versions is that we made it much easier [to use it offline](#offline-use) (i.e. without internet connection).
 
 
 
40
 
41
  ## Setup
42
 
43
  1. Accept user conditions
44
  2. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
45
- 3. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `4.x` with `pip install pyannote.audio`
46
 
47
  ## Quick start
48
 
@@ -93,6 +96,7 @@ Processing is fully automated:
93
 
94
  The second column is [pyannoteAI premium](https://huggingface.co/pyannoteAI/speaker-diarization-precision) speaker diarization pipeline, as of April 2025. To test it:
95
 
 
96
  1. Create an API key on [`pyannoteAI` dashboard](https://dashboard.pyannote.ai).
97
  2. Enjoy [`pyannoteAI`](https://www.pyannote.ai) precision speaker diarization pipeline by changing one single line of code!
98
 
@@ -101,7 +105,7 @@ from pyannote.audio import Pipeline
101
  pipeline = Pipeline.from_pretrained(
102
  - 'pyannote/speaker-diarization-4.0', token="{huggingface-token}")
103
  + 'pyannoteAI/speaker-diarization-precision', token="{pyannoteAI-api-key}")
104
- diarization = pipeline("udio.wav")
105
  ```
106
 
107
  ## Processing on GPU
@@ -200,14 +204,15 @@ diarization = pipeline("audio.wav")
200
  }
201
  ```
202
 
203
- 3. Speaker diarization pipeline
 
204
 
205
  ```bibtex
206
- @inproceedings{Bredin23,
207
- author={Hervé Bredin},
208
- title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
209
- year=2023,
210
- booktitle={Proc. INTERSPEECH 2023},
211
  }
212
  ```
213
 
 
36
  - stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels.
37
  - audio files sampled at a different rate are resampled to 16kHz automatically upon loading.
38
 
39
+ The main improvements brought by 4.0 over previous version 3.1 are
40
+
41
+ - much better speaker counting and assignment
42
+ - much easier [offline use](#offline-use) (i.e. without internet connection)
43
 
44
  ## Setup
45
 
46
  1. Accept user conditions
47
  2. Create access token at [`hf.co/settings/tokens`](https://hf.co/settings/tokens).
48
+ 3. Install [`pyannote.audio`](https://github.com/pyannote/pyannote-audio) `4.x.x` with `pip install pyannote.audio`
49
 
50
  ## Quick start
51
 
 
96
 
97
  The second column is [pyannoteAI premium](https://huggingface.co/pyannoteAI/speaker-diarization-precision) speaker diarization pipeline, as of April 2025. To test it:
98
 
99
+ 1. Create a free [`pyannoteAI`](https://dashboard.pyannote.ai) account and get 150h of free credits.
100
  1. Create an API key on [`pyannoteAI` dashboard](https://dashboard.pyannote.ai).
101
  2. Enjoy [`pyannoteAI`](https://www.pyannote.ai) precision speaker diarization pipeline by changing one single line of code!
102
 
 
105
  pipeline = Pipeline.from_pretrained(
106
  - 'pyannote/speaker-diarization-4.0', token="{huggingface-token}")
107
  + 'pyannoteAI/speaker-diarization-precision', token="{pyannoteAI-api-key}")
108
+ diarization = pipeline("audio.wav")
109
  ```
110
 
111
  ## Processing on GPU
 
204
  }
205
  ```
206
 
207
+
208
+ 3. Speaker clustering
209
 
210
  ```bibtex
211
+ @article{Landini2022,
212
+ author={Landini, Federico and Profant, J{\'a}n and Diez, Mireia and Burget, Luk{\'a}{\v{s}}},
213
+ title={{Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks}},
214
+ year={2022},
215
+ journal={Computer Speech \& Language},
216
  }
217
  ```
218