DiariZen β ONNX export for Vernacula
ONNX export of the DiariZen diarization pipeline (segmentation + WeSpeaker embedding + LDA/PLDA), packaged for use with Vernacula as its high-quality diarization backend.
- Conversion script:
scripts/diarizen_export/ - Vernacula: github.com/christopherthompson81/vernacula
- Upstream model:
BUT-FIT/diarizen-wavlm-large-s80-md
Non-commercial use only. The upstream DiariZen segmentation model is licensed CC-BY-NC-4.0. This ONNX repackaging is a derivative work and inherits the same non-commercial restriction. See the License section below.
Highlights
- RTF 1.75 β 0.66 (2.65Γ speedup) vs the upstream Python pipeline, achieved by fusing mel-filter postprocess with a streaming chunk queue and eliminating intermediate
filtered/softScoresarrays in segmentation decode. - Adaptive CPU threading (
min(12, 0.75 Γ cpu_count), 8 threads Γ 2 workers): segmentation path 157 s β 59 s on RTX 3090. The 8Γ2 layout outperformed many-tiny-workers in the threading sweep. - LDA/PLDA shipped as six raw float32
.binfiles (mean1,lda,mean2,plda_mu,plda_tr,plda_psi) withsqrt(256)/sqrt(128)normalisation factors pre-baked in. C# runs the speaker pipeline without a matrix library. - Un-normalised embedding output. L2 normalisation is deferred to post-LDA in the runtime, removing redundant ops from the ONNX graph.
- WeSpeaker Kaldi-Fbank contract spelled out exactly (80-dim, 25 ms window, 10 ms shift, Hamming, 32768 waveform scale) so the C# Fbank implementation matches without numerical surprises.
- Segmentation export batching evaluated and rejected. Batch-safe export technically succeeded (
batch_size β {1, 2, 4, 8, 16}validated) but the original batch-1 export was still faster at runtime β the gains were β€ 3.2% and net negative in practice.
Contents
| File | Purpose |
|---|---|
diarizen_segmentation.onnx (+ .data) |
DiariZen segmentation model (WavLM-Large + segmentation head) |
wespeaker_pyannote_weighted.onnx |
WeSpeaker embedding model (pyannote-weighted variant) |
plda/lda.bin, mean1.bin, mean2.bin, plda_mu.bin, plda_psi.bin, plda_tr.bin |
LDA + PLDA transform parameters as flat binary tensors |
metadata.json |
Per-file integrity hashes and runtime config |
The PLDA .bin files are raw float32 tensor dumps in the shapes Vernacula's
C# inference code expects β see the export script for layout.
Export provenance
Exported via scripts/diarizen_export/
in the Vernacula repo. The pipeline is split into three on-disk artifacts
(segmentation ONNX, embedding ONNX, PLDA bins) so each can be loaded into
ORT independently and the LDA/PLDA transforms can be applied as plain
linear algebra in C#.
License
This bundle is governed by CC-BY-NC-4.0 because the segmentation model inherits that license from upstream. Specifically:
| Component | Upstream license |
|---|---|
diarizen_segmentation.onnx |
CC-BY-NC-4.0 (from BUT-FIT/diarizen-wavlm-large-s80-md) |
wespeaker_pyannote_weighted.onnx |
Apache-2.0 (WeSpeaker), with pyannote-weighted variant inheriting upstream terms |
LDA/PLDA .bin parameters |
Derived from DiariZen training; inherit CC-BY-NC-4.0 |
The most restrictive license (CC-BY-NC-4.0) governs the bundle. No commercial use. If you need a commercially-usable diarization backend in Vernacula, use the default Sortformer pipeline instead.
Using these files
In Vernacula, this package is downloaded automatically once you accept the
gated-model notice in Settings β Manage Gated Models. Outside Vernacula,
pull with huggingface_hub and load with onnxruntime:
from huggingface_hub import snapshot_download
path = snapshot_download(repo_id="christopherthompson81/diarizen_onnx")
The PLDA .bin files are read as raw float32 tensors β see
scripts/diarizen_export/README.md
for the expected shapes and the post-processing pipeline.
Limitations
Numerical behavior matches the upstream DiariZen pipeline. Speaker-count accuracy, language and acoustic-domain coverage, and known failure modes inherit from the upstream model cards (segmentation, WeSpeaker). The segmentation model was trained on data with non-commercial restrictions, which is why this bundle cannot be redistributed for commercial use.
Citation
For the underlying models, please cite the upstream authors. See:
Acknowledgments
- Original DiariZen: BUT Speech@FIT (Brno University of Technology)
- Original WeSpeaker: WeNet community
- ONNX repackaging: Chris Thompson for Vernacula
Issues with the ONNX export specifically: open an issue on the Vernacula repo. Issues with the underlying models: see the upstream repositories.
See also
- Vernacula on GitHub β the speech pipeline app this package is built for
- Conversion script (
scripts/diarizen_export/) β the export pipeline that produced these files - DiariZen on GitHub β upstream pipeline source
BUT-FIT/diarizen-wavlm-large-s80-mdβ upstream segmentation model card- WeSpeaker on GitHub β upstream embedding model source
- Other Vernacula model packages
Model tree for christopherthompson81/diarizen_onnx
Base model
BUT-FIT/diarizen-wavlm-large-s80-md