alexwengg commited on
Commit
6b7b7dc
·
verified ·
1 Parent(s): fb3bc01

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -10
README.md CHANGED
@@ -23,22 +23,43 @@
23
 
24
  ## Model Input/Output Shapes
25
 
26
- Combined model (Sortformer.mlmodelc - default config):
27
 
28
  | Input | Shape | Description |
29
  |-------|-------|-------------|
30
- | chunk | [1, 112, 128] | Mel spectrogram features |
31
- | chunk_lengths | [1] | Actual chunk length |
32
- | spkcache | [1, 188, 512] | Speaker cache embeddings |
33
- | spkcache_lengths | [1] | Actual cache length |
34
- | fifo | [1, 40, 512] | FIFO queue embeddings |
35
- | fifo_lengths | [1] | Actual FIFO length |
36
 
37
  | Output | Shape | Description |
38
  |--------|-------|-------------|
39
- | speaker_preds | [T, 4] | Speaker probabilities (4 speakers) |
40
- | chunk_pre_encoder_embs | [T', 512] | Embeddings for state update |
41
- | chunk_pre_encoder_lengths | [1] | Actual embedding count |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Usage with FluidAudio (Swift)
44
 
 
23
 
24
  ## Model Input/Output Shapes
25
 
26
+ **General**:
27
 
28
  | Input | Shape | Description |
29
  |-------|-------|-------------|
30
+ | chunk | `[1, 8*(C+L+R), 128]` | Mel spectrogram features |
31
+ | chunk_lengths | `[1]` | Actual chunk length |
32
+ | spkcache | `[1, S, 512]` | Speaker cache embeddings |
33
+ | spkcache_lengths | `[1]` | Actual cache length |
34
+ | fifo | `[1, F, 512]` | FIFO queue embeddings |
35
+ | fifo_lengths | `[1]` | Actual FIFO length |
36
 
37
  | Output | Shape | Description |
38
  |--------|-------|-------------|
39
+ | speaker_preds | `[C+L+R+S+F, 4]` | Speaker probabilities (4 speakers) |
40
+ | chunk_pre_encoder_embs | `[C+L+R, 512]` | Embeddings for state update |
41
+ | chunk_pre_encoder_lengths | `[1]` | Actual embedding count |
42
+ | nest_encoder_embs | `[C+L+R+S+F, 192]` | Embeddings for speaker discrimination |
43
+ | nest_encoder_lengths | `[1]` | Actual speaker embedding count |
44
+
45
+ Note: `C = chunk_len`, `L = chunk_left_context`, `R = chunk_right_context`, `S = spkcache_len`, `F = fifo_len`.
46
+
47
+ **Configuration-Specific Shapes**:
48
+
49
+ | Input | Default | NVIDIA Low | NVIDIA High |
50
+ | chunk | `[1, 112, 128]` | `[1, 112, 128]` | `[1, 3048, 128]` |
51
+ | chunk_lengths | `[1]` | `[1]` | `[1]` |
52
+ | spkcache | `[1, 188, 512]` | `[1, 188, 512]` | `[1, 188, 512]` |
53
+ | spkcache_lengths | `[1]` | `[1]` | `[1]` |
54
+ | fifo | `[1, 40, 512]` | `[1, 188, 512]` | `[1, 40, 512]`
55
+ | fifo_lengths | `[1]` | `[1]` | `[1]` |
56
+
57
+ | Output | Default | NVIDIA Low | NVIDIA High |
58
+ | speaker_preds | `[1, 242, 128]` | `[1, 390, 128]` | `[1, 609, 128]` |
59
+ | chunk_pre_encoder_embs | `[1, 14, 512]` | `[1, 14, 512]` | `[1, 381, 512]` |
60
+ | chunk_pre_encoder_lengths | `[1]` | `[1]` | `[1]` |
61
+ | nest_encoder_embs | `[1, 242, 192]` | `[1, 390, 192]` | `[1, 609, 192]` |
62
+ | nest_encoder_lengths | `[1]` | `[1]` | `[1]` |
63
 
64
  ## Usage with FluidAudio (Swift)
65