DJRHails
/

voxpath-bkm-pyannote-embedding

speaker-recognition

Model card Files Files and versions

voxpath-bkm-pyannote-embedding / README.md

DJRHails's picture

Upload README.md with huggingface_hub

ae740f0 verified 16 days ago

|

history blame contribute delete

2.16 kB

	---
	license: mit
	library_name: voxpath
	tags:
	- speaker-recognition
	- quantizer
	- voxpath
	- pyannote
	---

	# voxpath quantizer

	Trained BinaryKMeansQuantizer (12 binary k-means levels, grouped into 4 display words of vocab size 8) that maps L2-normalized 512-dim speaker embeddings
	from [`pyannote/embedding`](https://huggingface.co/pyannote/embedding) into
	hierarchical word paths like `chalk.fjord.bismuth.elm`.

	Use with [`voxpath`](https://github.com/DJRHails/voxpath) to assign
	LM-readable speaker identities for diarization + transcription pipelines.

	## Files

	- `voxceleb+librispeech+commonvoice.pyannote-embedding.quantizer.json` — fitted quantizer state (92.4 MB).

	## Training corpus

	Trained on 14,307 unique English-speaking speakers from VoxCeleb 2 (5,800) + LibriSpeech (3,507) + CommonVoice EN (5,000). Each speaker contributed one ≥3 s utterance, embedded with `pyannote/embedding`, L2-normalized. The merged corpus is available under the same namespace: [`DJRHails/pyannote-embedding-voxceleb`](https://huggingface.co/datasets/DJRHails/pyannote-embedding-voxceleb), [`DJRHails/pyannote-embedding-librispeech`](https://huggingface.co/datasets/DJRHails/pyannote-embedding-librispeech), [`DJRHails/pyannote-embedding-commonvoice-en`](https://huggingface.co/datasets/DJRHails/pyannote-embedding-commonvoice-en).

	## Loading

	```python
	from huggingface_hub import hf_hub_download
	from voxpath.quantize import BinaryKMeansQuantizer

	path = hf_hub_download(
	"DJRHails/voxpath-bkm-pyannote-embedding",
	"voxceleb+librispeech+commonvoice.pyannote-embedding.quantizer.json",
	)
	quantizer = BinaryKMeansQuantizer.load(path)

	# Then, given a pyannote/embedding output `embedding` (shape (512,)):
	voxpath = quantizer.quantize(embedding)
	print(voxpath.to_tag()) # e.g. SPEAKER:halite.rill.bismuth.elm
	```

	## Why model-and-embedder-specific

	Speaker embeddings don't translate across embedders — a quantizer
	fitted on `pyannote/embedding` outputs has no meaning for embeddings
	from `wespeaker`, `ECAPA-TDNN`, or anything else. The repo name pins
	both the quantizer family (BKM) and the embedding model so users find
	the right artifact at a glance.