| --- |
| license: mit |
| library_name: voxpath |
| tags: |
| - speaker-recognition |
| - quantizer |
| - voxpath |
| - pyannote |
| --- |
| |
| # voxpath quantizer |
|
|
| Trained BinaryKMeansQuantizer (12 binary k-means levels, grouped into 4 display words of vocab size 8) that maps L2-normalized 512-dim speaker embeddings |
| from [`pyannote/embedding`](https://huggingface.co/pyannote/embedding) into |
| hierarchical word paths like `chalk.fjord.bismuth.elm`. |
|
|
| Use with [`voxpath`](https://github.com/DJRHails/voxpath) to assign |
| LM-readable speaker identities for diarization + transcription pipelines. |
|
|
| ## Files |
|
|
| - `voxceleb+librispeech+commonvoice.pyannote-embedding.quantizer.json` — fitted quantizer state (92.4 MB). |
|
|
| ## Training corpus |
|
|
| Trained on 14,307 unique English-speaking speakers from VoxCeleb 2 (5,800) + LibriSpeech (3,507) + CommonVoice EN (5,000). Each speaker contributed one ≥3 s utterance, embedded with `pyannote/embedding`, L2-normalized. The merged corpus is available under the same namespace: [`DJRHails/pyannote-embedding-voxceleb`](https://huggingface.co/datasets/DJRHails/pyannote-embedding-voxceleb), [`DJRHails/pyannote-embedding-librispeech`](https://huggingface.co/datasets/DJRHails/pyannote-embedding-librispeech), [`DJRHails/pyannote-embedding-commonvoice-en`](https://huggingface.co/datasets/DJRHails/pyannote-embedding-commonvoice-en). |
|
|
| ## Loading |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| from voxpath.quantize import BinaryKMeansQuantizer |
| |
| path = hf_hub_download( |
| "DJRHails/voxpath-bkm-pyannote-embedding", |
| "voxceleb+librispeech+commonvoice.pyannote-embedding.quantizer.json", |
| ) |
| quantizer = BinaryKMeansQuantizer.load(path) |
| |
| # Then, given a pyannote/embedding output `embedding` (shape (512,)): |
| voxpath = quantizer.quantize(embedding) |
| print(voxpath.to_tag()) # e.g. SPEAKER:halite.rill.bismuth.elm |
| ``` |
|
|
| ## Why model-and-embedder-specific |
|
|
| Speaker embeddings don't translate across embedders — a quantizer |
| fitted on `pyannote/embedding` outputs has no meaning for embeddings |
| from `wespeaker`, `ECAPA-TDNN`, or anything else. The repo name pins |
| both the quantizer family (BKM) and the embedding model so users find |
| the right artifact at a glance. |
|
|