Alkd's picture
Add model card
edf8e8c verified
metadata
license: cc-by-4.0
tags:
  - onnx
  - speaker-verification
  - wespeaker
  - pyannote

speaker-embedding-onnx

ONNX export of the ResNet34 backbone from pyannote/wespeaker-voxceleb-resnet34-LM.

Follows the official wespeaker/bin/export_onnx.py approach: fbank features are computed externally, only the backbone is in ONNX.

Inputs / Outputs

Name Shape Description
input_features (batch, T, 80) Kaldi fbank features (T is dynamic)
embedding (batch, 256) Speaker embedding vector

Fbank parameters (must match at inference)

kaldi.fbank(wav * 32768, num_mel_bins=80, frame_length=25, frame_shift=10, round_to_power_of_two=True, window_type="hamming", use_energy=False, snip_edges=True, dither=0.0, sample_frequency=16000)

Then subtract per-bin mean: feats -= feats.mean(axis=0).