diar_streaming_sortformer_4spk-v2.1 (ONNX)
Self-hosted ONNX export of NVIDIA's
nvidia/diar_streaming_sortformer_4spk-v2.1.
Mirrored under the Scrybl organization so Scrybl's
meeting-mode diarizer doesn't depend on a third-party personal namespace.
What this is
Streaming, end-to-end neural speaker diarizer (4-speaker cap, graceful degradation on 5+). One model handles VAD, speaker-change detection, overlap, and attribution with persistent streaming state (internal FIFO + speaker cache). Replaces the modular VAD + embedder + online-cluster stack typical of prior diarization pipelines.
Origin
Exported from NVIDIA's .nemo checkpoint
(diar_streaming_sortformer_4spk-v2.1.nemo, 471 MB) using
NeMo's ONNX export path with
SortformerEncLabelModel.export(..., onnx_opset_version=17).
Export script: see tools/sortformer-export/export_sortformer.py
in the Scrybl repo.
Contract
Opset: ai.onnx 17
IR version: 8
Inputs:
chunk(B,T,128) f32,chunk_lengths(1,) i64,spkcache(B,T,512) f32,spkcache_lengths(1,) i64,fifo(B,T,512) f32,fifo_lengths(1,) i64Outputs:
spkcache_fifo_chunk_preds(B,T,4) f32,chunk_pre_encode_embs(B,T,512) f32,chunk_pre_encode_lengths(1,) i64metadata_props(read byparakeet-rsat load time):key value chunk_len124fifo_len124spkcache_len188right_context1
License
NVIDIA Open Model License — commercial use permitted with attribution. See
the linked license file. Attribution: NVIDIA Corporation,
diar_streaming_sortformer_4spk-v2.1.
Citation
@misc{nvidia-sortformer-streaming-v2.1,
author = {NVIDIA},
title = {Streaming Sortformer 4-speaker v2.1},
year = {2024},
howpublished = {\url{https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1}}
}