younghan-meta commited on
Commit
4a15775
·
verified ·
1 Parent(s): 384f5c9

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. .gitattributes +2 -0
  2. README.md +60 -0
  3. poem.wav +3 -0
  4. sortformer.pte +3 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ poem.wav filter=lfs diff=lfs merge=lfs -text
37
+ sortformer.pte filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - executorch
5
+ - xnnpack
6
+ - speaker-diarization
7
+ - on-device
8
+ - streaming
9
+ pipeline_tag: audio-classification
10
+ base_model: nvidia/diar_streaming_sortformer_4spk-v2
11
+ ---
12
+
13
+ # Sortformer-ExecuTorch-XNNPACK
14
+
15
+ Pre-exported [ExecuTorch](https://github.com/pytorch/executorch) `.pte` file
16
+ for [Streaming Sortformer](https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2)
17
+ with **XNNPACK** backend (CPU). A streaming speaker diarization model that
18
+ identifies up to 4 speakers in audio.
19
+
20
+ ## Installation
21
+
22
+ ```bash
23
+ git clone https://github.com/pytorch/executorch/ ~/executorch
24
+ cd ~/executorch && ./install_executorch.sh
25
+ make sortformer-cpu
26
+ ```
27
+
28
+ ## Download
29
+
30
+ ```bash
31
+ pip install huggingface_hub
32
+ huggingface-cli download younghan-meta/Sortformer-ExecuTorch-XNNPACK --local-dir ~/sortformer
33
+ ```
34
+
35
+ ## Run
36
+
37
+ ```bash
38
+ cmake-out/examples/models/sortformer/sortformer_runner \
39
+ --model_path ~/sortformer/sortformer.pte \
40
+ --audio_path ~/sortformer/poem.wav
41
+ ```
42
+
43
+ Output shows detected speaker segments with start/end times.
44
+
45
+ Optional flags:
46
+ - `--threshold 0.5` -- speaker activity threshold (0.0-1.0)
47
+ - `--chunk_len 124` -- encode chunk size in 80ms frames
48
+ - `--fifo_len 124` -- FIFO buffer size in 80ms frames
49
+
50
+ ## Export Command
51
+
52
+ ```bash
53
+ pip install "nemo_toolkit[asr]"
54
+ python examples/models/sortformer/export_sortformer.py --backend xnnpack --output-dir ./sortformer_exports
55
+ ```
56
+
57
+ ## More Info
58
+
59
+ - [Official ExecuTorch Sortformer guide](https://github.com/pytorch/executorch/tree/main/examples/models/sortformer)
60
+ - [Original model](https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2)
poem.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0dd03dfb6fe83b7d10df166cb77d28bf139f9be2c739e9927c757d88255aa88b
3
+ size 768042
sortformer.pte ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e763fae031bc8675252f2d8de0e84ff71992db4eb04257e4a50b43c9b31a77c1
3
+ size 492384528