segmentation-3.0 ONNX

ONNX export of pyannote/segmentation-3.0 for speaker diarization (voice activity and speaker segmentation).

  • Input: waveform [batch, channels, samples], 16 kHz mono, e.g. [1, 1, 160000] for 10 seconds.
  • Output: logits [batch, num_frames, num_classes] (7 classes, powerset decoding).
  • Exported with opset 14. Use ONNX Runtime to run on device (Core ML conversion is not supported for this model due to control-flow ops).

Derived from pyannote.audio; see pyannote/segmentation-3.0 for the original model and license.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support