diar_streaming_sortformer_4spk-v2.1 (ONNX)

Self-hosted ONNX export of NVIDIA's nvidia/diar_streaming_sortformer_4spk-v2.1. Mirrored under the Scrybl organization so Scrybl's meeting-mode diarizer doesn't depend on a third-party personal namespace.

What this is

Streaming, end-to-end neural speaker diarizer (4-speaker cap, graceful degradation on 5+). One model handles VAD, speaker-change detection, overlap, and attribution with persistent streaming state (internal FIFO + speaker cache). Replaces the modular VAD + embedder + online-cluster stack typical of prior diarization pipelines.

Origin

Exported from NVIDIA's .nemo checkpoint (diar_streaming_sortformer_4spk-v2.1.nemo, 471 MB) using NeMo's ONNX export path with SortformerEncLabelModel.export(..., onnx_opset_version=17).

Export script: see tools/sortformer-export/export_sortformer.py in the Scrybl repo.

Contract

Opset: ai.onnx 17
IR version: 8
Inputs: chunk (B,T,128) f32, chunk_lengths (1,) i64, spkcache (B,T,512) f32, spkcache_lengths (1,) i64, fifo (B,T,512) f32, fifo_lengths (1,) i64
Outputs: spkcache_fifo_chunk_preds (B,T,4) f32, chunk_pre_encode_embs (B,T,512) f32, chunk_pre_encode_lengths (1,) i64
metadata_props (read by parakeet-rs at load time):

key value

chunk_len 124

fifo_len 124

spkcache_len 188

right_context 1

key	value
`chunk_len`	`124`
`fifo_len`	`124`
`spkcache_len`	`188`
`right_context`	`1`

License

NVIDIA Open Model License — commercial use permitted with attribution. See the linked license file. Attribution: NVIDIA Corporation, diar_streaming_sortformer_4spk-v2.1.

Citation

@misc{nvidia-sortformer-streaming-v2.1,
  author = {NVIDIA},
  title = {Streaming Sortformer 4-speaker v2.1},
  year = {2024},
  howpublished = {\url{https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support