dia-models — pyannote community-1 model bundle for the dia Rust crate
A single-repo distribution of every model artifact the
dia Rust crate needs to run
end-to-end speaker diarization with pyannote-community-1 parity:
- The segmentation-3.0 powerset speaker network (16 kHz audio → per-frame speaker activations).
- The WeSpeaker ResNet34-LM speaker-embedding network, in three forms (external-data ONNX, single-file ONNX, TorchScript).
- The PLDA whitening + LDA weights from the
pyannote/speaker-diarization-community-1pipeline, in both.npz(build-time) and raw little-endian f64.bin(runtime) form.
dia already embeds the segmentation model and the PLDA weights into
the compiled binary via include_bytes!; the WeSpeaker ONNX is
the only artifact callers must download separately. This repo lets
callers grab any individual model — or the whole bundle — without
spelunking through the upstream pyannote / WeSpeaker repos.
Attribution: this is a redistribution, not new model training. All weights come from upstream pyannote / WeSpeaker / BUT Speech@FIT. The licenses below MUST be preserved by anyone redistributing.
Files
| File | Size | Format | License |
|---|---|---|---|
segmentation-3.0.onnx |
5.99 MiB | ONNX (single file) | MIT |
wespeaker_resnet34_lm.onnx |
256 KiB | ONNX header (external data) | Apache-2.0 |
wespeaker_resnet34_lm.onnx.data |
25.3 MiB | external-data weights | Apache-2.0 |
wespeaker_resnet34_lm_packed.onnx |
25.5 MiB | ONNX (single file, repacked) | Apache-2.0 |
wespeaker_resnet34_lm.pt |
25.6 MiB | TorchScript | Apache-2.0 |
plda/eigenvectors_desc.bin |
128 KiB | f64 (128×128 row-major) | CC-BY-4.0 |
plda/lda.bin |
256 KiB | f64 (256×128 row-major) | CC-BY-4.0 |
plda/mean1.bin |
2 KiB | f64 (256,) | CC-BY-4.0 |
plda/mean2.bin |
1 KiB | f64 (128,) | CC-BY-4.0 |
plda/mu.bin |
1 KiB | f64 (128,) | CC-BY-4.0 |
plda/phi_desc.bin |
1 KiB | f64 (128,) | CC-BY-4.0 |
plda/psi.bin |
1 KiB | f64 (128,) | CC-BY-4.0 |
plda/tr.bin |
128 KiB | f64 (128×128 row-major) | CC-BY-4.0 |
plda/plda.npz |
131 KiB | numpy (mu, tr, psi) |
CC-BY-4.0 |
plda/xvec_transform.npz |
131 KiB | numpy (mean1, mean2, lda) |
CC-BY-4.0 |
Which file do I want?
Segmentation
Use segmentation-3.0.onnx. It feeds dia::segment::SegmentModel
(or any pyannote-segmentation-compatible runtime). Single file, no
external data, works on every ORT execution provider.
Embedding (WeSpeaker)
Three forms, same weights, pick by use case:
wespeaker_resnet34_lm.onnx+wespeaker_resnet34_lm.onnx.data— the default ONNX layout. Loads on CPU / TensorRT / CUDA / OpenVINO / DirectML. The.onnxand.onnx.datafiles MUST sit next to each other on disk; ORT resolves the external pointer by relative path.wespeaker_resnet34_lm_packed.onnx— same model with all weights inlined into one file. Use this if you want a single-file artifact, or if the runtime is CoreML (Apple Silicon — Apple's graph optimizer chokes on external initializers and reportsmodel_path must not be empty; the packed form sidesteps it). Otherwise functionally identical.wespeaker_resnet34_lm.pt— TorchScript export for thetchbackend. Bit-exact to upstream PyTorch on hard cases (heavy- overlap fixtures where the ONNX→ORT path can drift by O(1) per element). Pulls in libtorch (~600 MB shared library).
PLDA
The eight .bin files are the runtime data — raw little-endian f64
blobs that dia::plda embeds via include_bytes!. The two .npz
files are the build-time sources (xvec_transform.npz exposes
mean1 / mean2 / lda; plda.npz exposes mu / tr /
psi); they are mirrored from the upstream pyannote-community-1
snapshot for traceability and so the .bin extraction can be
re-run via scripts/extract-plda-blobs.sh in the dia repo.
eigenvectors_desc.bin and phi_desc.bin are scipy-derived
eigenvectors of the PLDA generalized eigenproblem (B, W) — pinned
to avoid LAPACK eigenvector-sign indeterminism (which produced a
38% DER divergence on three-speaker fixtures when nalgebra and
scipy disagreed on 67 of 128 column signs). See
models/plda/SOURCE.md
in the dia repo for the regeneration procedure.
Provenance
segmentation-3.0.onnx
- Upstream:
pyannote/segmentation-3.0 - Original layout:
pytorch_model.onnxin the upstream HF repo. - License: MIT — Copyright (c) 2023 CNRS
- Author: Hervé Bredin (CNRS / IRIT), pyannote.audio author and lead trainer.
- SHA-256:
057ee564753071c0b09b5b611648b50ac188d50846bff5f01e9f7bbf1591ea25
wespeaker_resnet34_lm.onnx (+ .data) / .pt / _packed.onnx
- Upstream model architecture: WeSpeaker ResNet34 with large-margin (LM) angular fine-tuning, trained on VoxCeleb-2.
- Upstream sources:
- WeSpeaker project (Apache-2.0)
onnx-community/wespeaker_resnet34_lmfor the ONNX export.
- License: Apache-2.0.
_packed.onnxderivative: produced by loadingwespeaker_resnet34_lm.onnx+.onnx.datavia theonnxPython library (onnx.load(path, load_external_data=True)) and re-saving withsave_as_external_data=False. Same weights, no external file.
plda/
- Upstream:
pyannote/speaker-diarization-community-1 - License: CC-BY-4.0
- Snapshot revision:
3533c8cf8e369892e6b79ff1bf80f7b0286a54ee - Original layout in the upstream HF repo:
plda/xvec_transform.npzandplda/plda.npz. - Attribution (per upstream
plda/README.md): PLDA model trained by BUT Speech@FIT; integration of VBx in pyannote.audio by Jiangyu Han and Petr Pálka.
Usage
From dia (Rust)
use diarization::{
embed::EmbedModel,
plda::PldaTransform,
segment::SegmentModel,
};
// Segmentation + PLDA are bundled by default — no download needed.
let mut seg = SegmentModel::bundled()?;
let plda = PldaTransform::new()?;
// WeSpeaker is BYO; download from this repo.
let mut emb = EmbedModel::from_file("wespeaker_resnet34_lm.onnx")?;
# Ok::<(), Box<dyn std::error::Error>>(())
Direct download
# whole bundle
hf download FinDIT-Studio/dia-models --local-dir ./dia-models
# just the embedding model (default ONNX form)
hf download FinDIT-Studio/dia-models \
wespeaker_resnet34_lm.onnx wespeaker_resnet34_lm.onnx.data \
--local-dir ./models
# CoreML-friendly single-file form
hf download FinDIT-Studio/dia-models \
wespeaker_resnet34_lm_packed.onnx --local-dir ./models
Licenses
This repository redistributes model artifacts under three different licenses. Each artifact retains its upstream license. By using this bundle you agree to comply with all three:
- MIT for
segmentation-3.0.onnx(Copyright © 2023 CNRS, Hervé Bredin). SeeLICENSE.MIT. - Apache-2.0 for the WeSpeaker artifacts. See
LICENSE.APACHE-2.0. - CC-BY-4.0 for everything under
plda/. SeeLICENSE.CC-BY-4.0. Required attribution: PLDA model trained by BUT Speech@FIT; integration of VBx in pyannote.audio by Jiangyu Han and Petr Pálka.
The dia Rust crate that consumes these models is itself dual-licensed
MIT OR Apache-2.0; that licensing applies to the source code, not to the
model weights bundled here.
Citation
If you use these weights in academic work, please cite the upstream papers / model cards:
- Segmentation-3.0: Hervé Bredin, pyannote.audio 2.1 speaker
diarization pipeline: principle, benchmark, and recipe, Interspeech
- WeSpeaker: Wang et al., WeSpeaker: A research and production oriented speaker embedding learning toolkit, ICASSP 2023.
- PLDA / VBx: Landini et al., Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks, Computer Speech & Language, 2022.
Issues / questions
This repo is a redistribution of upstream artifacts. Please file issues against:
- The dia Rust crate: https://github.com/al8n/diarization/issues
- The pyannote.audio project: https://github.com/pyannote/pyannote-audio/issues
- The WeSpeaker project: https://github.com/wenet-e2e/wespeaker/issues