segmentation-3.0 ONNX

ONNX export of pyannote/segmentation-3.0 for speaker diarization (voice activity and speaker segmentation).

Input: waveform [batch, channels, samples], 16 kHz mono, e.g. [1, 1, 160000] for 10 seconds.
Output: logits [batch, num_frames, num_classes] (7 classes, powerset decoding).
Exported with opset 14. Use ONNX Runtime to run on device (Core ML conversion is not supported for this model due to control-flow ops).

Derived from pyannote.audio; see pyannote/segmentation-3.0 for the original model and license.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support