DiariZen β€” ONNX export for Vernacula

ONNX export of the DiariZen diarization pipeline (segmentation + WeSpeaker embedding + LDA/PLDA), packaged for use with Vernacula as its high-quality diarization backend.

Non-commercial use only. The upstream DiariZen segmentation model is licensed CC-BY-NC-4.0. This ONNX repackaging is a derivative work and inherits the same non-commercial restriction. See the License section below.

Highlights

  • RTF 1.75 β†’ 0.66 (2.65Γ— speedup) vs the upstream Python pipeline, achieved by fusing mel-filter postprocess with a streaming chunk queue and eliminating intermediate filtered / softScores arrays in segmentation decode.
  • Adaptive CPU threading (min(12, 0.75 Γ— cpu_count), 8 threads Γ— 2 workers): segmentation path 157 s β†’ 59 s on RTX 3090. The 8Γ—2 layout outperformed many-tiny-workers in the threading sweep.
  • LDA/PLDA shipped as six raw float32 .bin files (mean1, lda, mean2, plda_mu, plda_tr, plda_psi) with sqrt(256) / sqrt(128) normalisation factors pre-baked in. C# runs the speaker pipeline without a matrix library.
  • Un-normalised embedding output. L2 normalisation is deferred to post-LDA in the runtime, removing redundant ops from the ONNX graph.
  • WeSpeaker Kaldi-Fbank contract spelled out exactly (80-dim, 25 ms window, 10 ms shift, Hamming, 32768 waveform scale) so the C# Fbank implementation matches without numerical surprises.
  • Segmentation export batching evaluated and rejected. Batch-safe export technically succeeded (batch_size ∈ {1, 2, 4, 8, 16} validated) but the original batch-1 export was still faster at runtime β€” the gains were ≀ 3.2% and net negative in practice.

Contents

File Purpose
diarizen_segmentation.onnx (+ .data) DiariZen segmentation model (WavLM-Large + segmentation head)
wespeaker_pyannote_weighted.onnx WeSpeaker embedding model (pyannote-weighted variant)
plda/lda.bin, mean1.bin, mean2.bin, plda_mu.bin, plda_psi.bin, plda_tr.bin LDA + PLDA transform parameters as flat binary tensors
metadata.json Per-file integrity hashes and runtime config

The PLDA .bin files are raw float32 tensor dumps in the shapes Vernacula's C# inference code expects β€” see the export script for layout.

Export provenance

Exported via scripts/diarizen_export/ in the Vernacula repo. The pipeline is split into three on-disk artifacts (segmentation ONNX, embedding ONNX, PLDA bins) so each can be loaded into ORT independently and the LDA/PLDA transforms can be applied as plain linear algebra in C#.

License

This bundle is governed by CC-BY-NC-4.0 because the segmentation model inherits that license from upstream. Specifically:

Component Upstream license
diarizen_segmentation.onnx CC-BY-NC-4.0 (from BUT-FIT/diarizen-wavlm-large-s80-md)
wespeaker_pyannote_weighted.onnx Apache-2.0 (WeSpeaker), with pyannote-weighted variant inheriting upstream terms
LDA/PLDA .bin parameters Derived from DiariZen training; inherit CC-BY-NC-4.0

The most restrictive license (CC-BY-NC-4.0) governs the bundle. No commercial use. If you need a commercially-usable diarization backend in Vernacula, use the default Sortformer pipeline instead.

Using these files

In Vernacula, this package is downloaded automatically once you accept the gated-model notice in Settings β†’ Manage Gated Models. Outside Vernacula, pull with huggingface_hub and load with onnxruntime:

from huggingface_hub import snapshot_download
path = snapshot_download(repo_id="christopherthompson81/diarizen_onnx")

The PLDA .bin files are read as raw float32 tensors β€” see scripts/diarizen_export/README.md for the expected shapes and the post-processing pipeline.

Limitations

Numerical behavior matches the upstream DiariZen pipeline. Speaker-count accuracy, language and acoustic-domain coverage, and known failure modes inherit from the upstream model cards (segmentation, WeSpeaker). The segmentation model was trained on data with non-commercial restrictions, which is why this bundle cannot be redistributed for commercial use.

Citation

For the underlying models, please cite the upstream authors. See:

Acknowledgments

Issues with the ONNX export specifically: open an issue on the Vernacula repo. Issues with the underlying models: see the upstream repositories.

See also

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for christopherthompson81/diarizen_onnx

Quantized
(1)
this model