speech-turn-detector-onnx

ONNX export of pyannote/segmentation-3.0 (PyanNet architecture).

Exported from local pytorch_model.bin using torch.jit.trace + legacy ONNX exporter (opset 14).

Inputs / Outputs

Name	Shape	Description
`input_values`	`(batch, 1, 160000)`	Raw waveform, 10 s @ 16 kHz
`logits`	`(batch, 589, 7)`	Powerset speaker-activity logits

The 7 output classes are powerset-encoded speaker activity for up to 3 speakers with max 2 simultaneous.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support