diar_streaming_sortformer_4spk-v2.1 (ONNX)

Self-hosted ONNX export of NVIDIA's nvidia/diar_streaming_sortformer_4spk-v2.1. Mirrored under the Scrybl organization so Scrybl's meeting-mode diarizer doesn't depend on a third-party personal namespace.

What this is

Streaming, end-to-end neural speaker diarizer (4-speaker cap, graceful degradation on 5+). One model handles VAD, speaker-change detection, overlap, and attribution with persistent streaming state (internal FIFO + speaker cache). Replaces the modular VAD + embedder + online-cluster stack typical of prior diarization pipelines.

Origin

Exported from NVIDIA's .nemo checkpoint (diar_streaming_sortformer_4spk-v2.1.nemo, 471 MB) using NeMo's ONNX export path with SortformerEncLabelModel.export(..., onnx_opset_version=17).

Export script: see tools/sortformer-export/export_sortformer.py in the Scrybl repo.

Contract

  • Opset: ai.onnx 17

  • IR version: 8

  • Inputs: chunk (B,T,128) f32, chunk_lengths (1,) i64, spkcache (B,T,512) f32, spkcache_lengths (1,) i64, fifo (B,T,512) f32, fifo_lengths (1,) i64

  • Outputs: spkcache_fifo_chunk_preds (B,T,4) f32, chunk_pre_encode_embs (B,T,512) f32, chunk_pre_encode_lengths (1,) i64

  • metadata_props (read by parakeet-rs at load time):

    key value
    chunk_len 124
    fifo_len 124
    spkcache_len 188
    right_context 1

License

NVIDIA Open Model License — commercial use permitted with attribution. See the linked license file. Attribution: NVIDIA Corporation, diar_streaming_sortformer_4spk-v2.1.

Citation

@misc{nvidia-sortformer-streaming-v2.1,
  author = {NVIDIA},
  title = {Streaming Sortformer 4-speaker v2.1},
  year = {2024},
  howpublished = {\url{https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support