Sortformer-ExecuTorch-XNNPACK
Pre-exported ExecuTorch .pte file
for Streaming Sortformer
with XNNPACK backend (CPU). A streaming speaker diarization model that
identifies up to 4 speakers in audio.
Installation
git clone https://github.com/pytorch/executorch/ ~/executorch
cd ~/executorch && ./install_executorch.sh
make sortformer-cpu
Download
pip install huggingface_hub
huggingface-cli download younghan-meta/Sortformer-ExecuTorch-XNNPACK --local-dir ~/sortformer
Run
cmake-out/examples/models/sortformer/sortformer_runner \
--model_path ~/sortformer/sortformer.pte \
--audio_path ~/sortformer/poem.wav
Output shows detected speaker segments with start/end times.
Optional flags:
--threshold 0.5-- speaker activity threshold (0.0-1.0)--chunk_len 124-- encode chunk size in 80ms frames--fifo_len 124-- FIFO buffer size in 80ms frames
Export Command
pip install "nemo_toolkit[asr]"
python examples/models/sortformer/export_sortformer.py --backend xnnpack --output-dir ./sortformer_exports
More Info
Model tree for younghan-meta/Sortformer-ExecuTorch-XNNPACK
Base model
nvidia/diar_streaming_sortformer_4spk-v2