aoiandroid's picture
Add model card (README)
8e62a4e verified
metadata
license: mit
tags:
  - audio
  - speaker-diarization
  - onnx
  - pyannote

segmentation-3.0 ONNX

ONNX export of pyannote/segmentation-3.0 for speaker diarization (voice activity and speaker segmentation).

  • Input: waveform [batch, channels, samples], 16 kHz mono, e.g. [1, 1, 160000] for 10 seconds.
  • Output: logits [batch, num_frames, num_classes] (7 classes, powerset decoding).
  • Exported with opset 14. Use ONNX Runtime to run on device (Core ML conversion is not supported for this model due to control-flow ops).

Derived from pyannote.audio; see pyannote/segmentation-3.0 for the original model and license.