richiejp commited on
Commit
29ca384
Β·
verified Β·
1 Parent(s): 00b272a

Sync model card with upstream GitHub inference README

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md CHANGED
@@ -54,6 +54,35 @@ factor (higher is faster than realtime).
54
  - `bf16` GGUFs are ~12 % smaller with identical quality and speed; pick `f32`
55
  unless download size matters.
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  ## Files in this repository
58
 
59
  | File | Size | Model |
@@ -61,6 +90,8 @@ factor (higher is faster than realtime).
61
  | `localvqe-v1.4-aec-200K-f32.gguf` | 3 MB | v1.4-AEC (echo only) |
62
  | `localvqe-v1.4-aec-200K-bf16.gguf` | 2.6 MB | v1.4-AEC, conv weights in BF16 |
63
  | `localvqe-v1.4-aec-2.7K-f32.gguf` | 17 KB | v1.4-AEC front-end only (adaptive filter, no mask) |
 
 
64
  | `localvqe-v1.3-4.8M-f32.gguf` | 19 MB | v1.3 joint β€” GGUF the engine loads |
65
  | `localvqe-v1.3-4.8M.pt` | 55 MB | v1.3 joint β€” PyTorch checkpoint (research) |
66
  | `localvqe-v1.2-1.3M-f32.gguf` | 5 MB | v1.2 joint β€” GGUF |
@@ -176,6 +207,21 @@ button produces APA / BibTeX), and the upstream DeepVQE paper:
176
  }
177
  ```
178
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
179
  ## Dataset attribution
180
 
181
  Weights are trained on the
 
54
  - `bf16` GGUFs are ~12 % smaller with identical quality and speed; pick `f32`
55
  unless download size matters.
56
 
57
+ ### Compact line β€” GTCRN-AEC (for lower-power CPUs)
58
+
59
+ A separate, much smaller second line of models for lower-power CPUs: a
60
+ ~49 K-parameter **GTCRN-AEC** network β€” a distinct architecture based on
61
+ [GTCRN](https://github.com/Xiaobin-Rong/gtcrn) (Rong et al., ICASSP 2024) β€”
62
+ paired with the project's DSP echo-cancellation front-end. The GGUFs are
63
+ self-contained, so they run with the same single command as every other model.
64
+ Two variants share the architecture:
65
+
66
+ | Model | Does | Params |
67
+ |---|---|---:|
68
+ | **localvqe-pi-v1-49k** | AEC + NS + dereverb (full enhance) | 49 K |
69
+ | **localvqe-pi-aec-v1-49k** | echo only β€” keeps noise + room | 49 K |
70
+
71
+ Whole-clip real-time factor on the real ggml graph, benchmarked on a Raspberry
72
+ Pi 5 (one example of a low-power target; `test_gtcrn --bench`, Cortex-A76,
73
+ Ubuntu 24.04), parity-verified to the PyTorch reference within ~1e-6 on-device.
74
+ RTF is identical for both variants:
75
+
76
+ | Threads | 8 s clip | RTF | RT factor |
77
+ |--:|--:|--:|--:|
78
+ | 1 | 388 ms | 0.048 | ~21Γ— |
79
+ | 2 | 219 ms | 0.027 | ~37Γ— |
80
+ | 4 | 163 ms | 0.020 | ~49Γ— |
81
+
82
+ That is ~0.78 ms per 16 ms hop single-threaded. Runs on any CPU; for single-board
83
+ ARM, cross-compile for aarch64 with `ggml/docker/Dockerfile.arm64` (docker buildx
84
+ + qemu). `f16`/`q8` quantizations are published only if/when released.
85
+
86
  ## Files in this repository
87
 
88
  | File | Size | Model |
 
90
  | `localvqe-v1.4-aec-200K-f32.gguf` | 3 MB | v1.4-AEC (echo only) |
91
  | `localvqe-v1.4-aec-200K-bf16.gguf` | 2.6 MB | v1.4-AEC, conv weights in BF16 |
92
  | `localvqe-v1.4-aec-2.7K-f32.gguf` | 17 KB | v1.4-AEC front-end only (adaptive filter, no mask) |
93
+ | `localvqe-pi-v1-49k-f32.gguf` | 2.3 MB | Compact line β€” GTCRN-AEC full enhance (echo + NS + dereverb) |
94
+ | `localvqe-pi-aec-v1-49k-f32.gguf` | 2.3 MB | Compact line β€” GTCRN-AEC echo-only (keeps noise + room) |
95
  | `localvqe-v1.3-4.8M-f32.gguf` | 19 MB | v1.3 joint β€” GGUF the engine loads |
96
  | `localvqe-v1.3-4.8M.pt` | 55 MB | v1.3 joint β€” PyTorch checkpoint (research) |
97
  | `localvqe-v1.2-1.3M-f32.gguf` | 5 MB | v1.2 joint β€” GGUF |
 
207
  }
208
  ```
209
 
210
+ The compact GTCRN-AEC line is based on **GTCRN** β€” please also cite:
211
+
212
+ ```bibtex
213
+ @inproceedings{rong2024gtcrn,
214
+ title = {GTCRN: A Speech Enhancement Model Requiring Ultralow
215
+ Computational Resources},
216
+ author = {Rong, Xiaobin and Sun, Tianchi and Zhang, Xu and Hu, Yuxiang
217
+ and Zhu, Changbao and Lu, Jing},
218
+ booktitle = {ICASSP 2024 - 2024 IEEE International Conference on Acoustics,
219
+ Speech and Signal Processing (ICASSP)},
220
+ pages = {971--975}, year = {2024},
221
+ doi = {10.1109/ICASSP48485.2024.10448310}
222
+ }
223
+ ```
224
+
225
  ## Dataset attribution
226
 
227
  Weights are trained on the