speech-turn-detector-onnx

ONNX export of pyannote/segmentation-3.0 (PyanNet architecture).

Exported from local pytorch_model.bin using torch.jit.trace + legacy ONNX exporter (opset 14).

Inputs / Outputs

Name Shape Description
input_values (batch, 1, 160000) Raw waveform, 10 s @ 16 kHz
logits (batch, 589, 7) Powerset speaker-activity logits

The 7 output classes are powerset-encoded speaker activity for up to 3 speakers with max 2 simultaneous.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support