dia-models — pyannote community-1 model bundle for the `dia` Rust crate

A single-repo distribution of every model artifact the dia Rust crate needs to run end-to-end speaker diarization with pyannote-community-1 parity:

The segmentation-3.0 powerset speaker network (16 kHz audio → per-frame speaker activations).
The WeSpeaker ResNet34-LM speaker-embedding network, in three forms (external-data ONNX, single-file ONNX, TorchScript).
The PLDA whitening + LDA weights from the pyannote/speaker-diarization-community-1 pipeline, in both .npz (build-time) and raw little-endian f64 .bin (runtime) form.

dia already embeds the segmentation model and the PLDA weights into the compiled binary via include_bytes!; the WeSpeaker ONNX is the only artifact callers must download separately. This repo lets callers grab any individual model — or the whole bundle — without spelunking through the upstream pyannote / WeSpeaker repos.

Attribution: this is a redistribution, not new model training. All weights come from upstream pyannote / WeSpeaker / BUT Speech@FIT. The licenses below MUST be preserved by anyone redistributing.

Files

File	Size	Format	License
`segmentation-3.0.onnx`	5.99 MiB	ONNX (single file)	MIT
`wespeaker_resnet34_lm.onnx`	256 KiB	ONNX header (external data)	Apache-2.0
`wespeaker_resnet34_lm.onnx.data`	25.3 MiB	external-data weights	Apache-2.0
`wespeaker_resnet34_lm_packed.onnx`	25.5 MiB	ONNX (single file, repacked)	Apache-2.0
`wespeaker_resnet34_lm.pt`	25.6 MiB	TorchScript	Apache-2.0
`plda/eigenvectors_desc.bin`	128 KiB	f64 (128×128 row-major)	CC-BY-4.0
`plda/lda.bin`	256 KiB	f64 (256×128 row-major)	CC-BY-4.0
`plda/mean1.bin`	2 KiB	f64 (256,)	CC-BY-4.0
`plda/mean2.bin`	1 KiB	f64 (128,)	CC-BY-4.0
`plda/mu.bin`	1 KiB	f64 (128,)	CC-BY-4.0
`plda/phi_desc.bin`	1 KiB	f64 (128,)	CC-BY-4.0
`plda/psi.bin`	1 KiB	f64 (128,)	CC-BY-4.0
`plda/tr.bin`	128 KiB	f64 (128×128 row-major)	CC-BY-4.0
`plda/plda.npz`	131 KiB	numpy (`mu`, `tr`, `psi`)	CC-BY-4.0
`plda/xvec_transform.npz`	131 KiB	numpy (`mean1`, `mean2`, `lda`)	CC-BY-4.0

Which file do I want?

Segmentation

Use segmentation-3.0.onnx. It feeds dia::segment::SegmentModel (or any pyannote-segmentation-compatible runtime). Single file, no external data, works on every ORT execution provider.

Embedding (WeSpeaker)

Three forms, same weights, pick by use case:

wespeaker_resnet34_lm.onnx + wespeaker_resnet34_lm.onnx.data — the default ONNX layout. Loads on CPU / TensorRT / CUDA / OpenVINO / DirectML. The .onnx and .onnx.data files MUST sit next to each other on disk; ORT resolves the external pointer by relative path.
wespeaker_resnet34_lm_packed.onnx — same model with all weights inlined into one file. Use this if you want a single-file artifact, or if the runtime is CoreML (Apple Silicon — Apple's graph optimizer chokes on external initializers and reports model_path must not be empty; the packed form sidesteps it). Otherwise functionally identical.
wespeaker_resnet34_lm.pt — TorchScript export for the tch backend. Bit-exact to upstream PyTorch on hard cases (heavy- overlap fixtures where the ONNX→ORT path can drift by O(1) per element). Pulls in libtorch (~600 MB shared library).

PLDA

The eight .bin files are the runtime data — raw little-endian f64 blobs that dia::plda embeds via include_bytes!. The two .npz files are the build-time sources (xvec_transform.npz exposes mean1 / mean2 / lda; plda.npz exposes mu / tr / psi); they are mirrored from the upstream pyannote-community-1 snapshot for traceability and so the .bin extraction can be re-run via scripts/extract-plda-blobs.sh in the dia repo.

eigenvectors_desc.bin and phi_desc.bin are scipy-derived eigenvectors of the PLDA generalized eigenproblem (B, W) — pinned to avoid LAPACK eigenvector-sign indeterminism (which produced a 38% DER divergence on three-speaker fixtures when nalgebra and scipy disagreed on 67 of 128 column signs). See models/plda/SOURCE.md in the dia repo for the regeneration procedure.

Provenance

segmentation-3.0.onnx

Upstream: pyannote/segmentation-3.0
Original layout: pytorch_model.onnx in the upstream HF repo.
License: MIT — Copyright (c) 2023 CNRS
Author: Hervé Bredin (CNRS / IRIT), pyannote.audio author and lead trainer.
SHA-256: 057ee564753071c0b09b5b611648b50ac188d50846bff5f01e9f7bbf1591ea25

wespeaker_resnet34_lm.onnx (+ .data) / .pt / _packed.onnx

Upstream model architecture: WeSpeaker ResNet34 with large-margin (LM) angular fine-tuning, trained on VoxCeleb-2.
Upstream sources:
- WeSpeaker project (Apache-2.0)
- onnx-community/wespeaker_resnet34_lm for the ONNX export.
License: Apache-2.0.
_packed.onnx derivative: produced by loading wespeaker_resnet34_lm.onnx + .onnx.data via the onnx Python library (onnx.load(path, load_external_data=True)) and re-saving with save_as_external_data=False. Same weights, no external file.

plda/

Upstream: pyannote/speaker-diarization-community-1
License: CC-BY-4.0
Snapshot revision: 3533c8cf8e369892e6b79ff1bf80f7b0286a54ee
Original layout in the upstream HF repo: plda/xvec_transform.npz and plda/plda.npz.
Attribution (per upstream plda/README.md): PLDA model trained by BUT Speech@FIT; integration of VBx in pyannote.audio by Jiangyu Han and Petr Pálka.

Usage

From `dia` (Rust)

use diarization::{
  embed::EmbedModel,
  plda::PldaTransform,
  segment::SegmentModel,
};
// Segmentation + PLDA are bundled by default — no download needed.
let mut seg = SegmentModel::bundled()?;
let plda = PldaTransform::new()?;
// WeSpeaker is BYO; download from this repo.
let mut emb = EmbedModel::from_file("wespeaker_resnet34_lm.onnx")?;
# Ok::<(), Box<dyn std::error::Error>>(())

Direct download

# whole bundle
hf download FinDIT-Studio/dia-models --local-dir ./dia-models

# just the embedding model (default ONNX form)
hf download FinDIT-Studio/dia-models \
  wespeaker_resnet34_lm.onnx wespeaker_resnet34_lm.onnx.data \
  --local-dir ./models

# CoreML-friendly single-file form
hf download FinDIT-Studio/dia-models \
  wespeaker_resnet34_lm_packed.onnx --local-dir ./models

Licenses

This repository redistributes model artifacts under three different licenses. Each artifact retains its upstream license. By using this bundle you agree to comply with all three:

Apache-2.0 for the WeSpeaker artifacts. See LICENSE.APACHE-2.0.
CC-BY-4.0 for everything under plda/. See LICENSE.CC-BY-4.0. Required attribution: PLDA model trained by BUT Speech@FIT; integration of VBx in pyannote.audio by Jiangyu Han and Petr Pálka.

The dia Rust crate that consumes these models is itself dual-licensed MIT OR Apache-2.0; that licensing applies to the source code, not to the model weights bundled here.

Citation

If you use these weights in academic work, please cite the upstream papers / model cards:

Segmentation-3.0: Hervé Bredin, pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe, Interspeech
WeSpeaker: Wang et al., WeSpeaker: A research and production oriented speaker embedding learning toolkit, ICASSP 2023.
PLDA / VBx: Landini et al., Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks, Computer Speech & Language, 2022.

Issues / questions

This repo is a redistribution of upstream artifacts. Please file issues against:

The dia Rust crate: https://github.com/al8n/diarization/issues
The pyannote.audio project: https://github.com/pyannote/pyannote-audio/issues
The WeSpeaker project: https://github.com/wenet-e2e/wespeaker/issues

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Voice Activity Detection

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

dia-models — pyannote community-1 model bundle for the dia Rust crate