voxpath quantizer

Trained HierarchicalHashQuantizer (18 bits grouped into 6 display words of vocab size 8) that maps L2-normalized 512-dim speaker embeddings from pyannote/embedding into hierarchical word paths like chalk.fjord.bismuth.elm.

Use with voxpath to assign LM-readable speaker identities for diarization + transcription pipelines.

Files

  • voxceleb+librispeech+commonvoice.pyannote-embedding.hierarchical-hash.quantizer.json — fitted quantizer state (0.2 MB).

Training corpus

Trained on 14,307 unique English-speaking speakers from VoxCeleb 2 (5,800) + LibriSpeech (3,507) + CommonVoice EN (5,000). Each speaker contributed one ≥3 s utterance, embedded with pyannote/embedding, L2-normalized. The merged corpus is available under the same namespace: DJRHails/pyannote-embedding-voxceleb, DJRHails/pyannote-embedding-librispeech, DJRHails/pyannote-embedding-commonvoice-en.

Loading

from huggingface_hub import hf_hub_download
from voxpath.hashing import HierarchicalHashQuantizer

path = hf_hub_download(
    "DJRHails/voxpath-hierarchical-hash-pyannote-embedding",
    "voxceleb+librispeech+commonvoice.pyannote-embedding.hierarchical-hash.quantizer.json",
)
quantizer = HierarchicalHashQuantizer.load(path)

# Then, given a pyannote/embedding output `embedding` (shape (512,)):
voxpath = quantizer.quantize(embedding)
print(voxpath.to_tag())  # e.g. SPEAKER:halite.rill.bismuth.elm

Why model-and-embedder-specific

Speaker embeddings don't translate across embedders — a quantizer fitted on pyannote/embedding outputs has no meaning for embeddings from wespeaker, ECAPA-TDNN, or anything else. The repo name pins both the quantizer family (HierarchicalHashQuantizer) and the embedding model so users find the right artifact at a glance.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support