| --- |
| license: cc-by-4.0 |
| tags: |
| - onnx |
| - speaker-verification |
| - wespeaker |
| - pyannote |
| --- |
| |
| # speaker-embedding-onnx |
|
|
| ONNX export of the ResNet34 backbone from [pyannote/wespeaker-voxceleb-resnet34-LM](https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM). |
|
|
| Follows the official [wespeaker/bin/export_onnx.py](https://github.com/wenet-e2e/wespeaker/blob/master/wespeaker/bin/export_onnx.py) approach: fbank features are computed externally, only the backbone is in ONNX. |
|
|
| ## Inputs / Outputs |
|
|
| | Name | Shape | Description | |
| |---|---|---| |
| | `input_features` | `(batch, T, 80)` | Kaldi fbank features (T is dynamic) | |
| | `embedding` | `(batch, 256)` | Speaker embedding vector | |
|
|
| ## Fbank parameters (must match at inference) |
|
|
| `kaldi.fbank(wav * 32768, num_mel_bins=80, frame_length=25, frame_shift=10, |
| round_to_power_of_two=True, window_type="hamming", use_energy=False, |
| snip_edges=True, dither=0.0, sample_frequency=16000)` |
|
|
| Then subtract per-bin mean: `feats -= feats.mean(axis=0)`. |
|
|