Sync model card with upstream GitHub inference README
Browse files
README.md
CHANGED
|
@@ -54,6 +54,35 @@ factor (higher is faster than realtime).
|
|
| 54 |
- `bf16` GGUFs are ~12 % smaller with identical quality and speed; pick `f32`
|
| 55 |
unless download size matters.
|
| 56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
## Files in this repository
|
| 58 |
|
| 59 |
| File | Size | Model |
|
|
@@ -61,6 +90,8 @@ factor (higher is faster than realtime).
|
|
| 61 |
| `localvqe-v1.4-aec-200K-f32.gguf` | 3 MB | v1.4-AEC (echo only) |
|
| 62 |
| `localvqe-v1.4-aec-200K-bf16.gguf` | 2.6 MB | v1.4-AEC, conv weights in BF16 |
|
| 63 |
| `localvqe-v1.4-aec-2.7K-f32.gguf` | 17 KB | v1.4-AEC front-end only (adaptive filter, no mask) |
|
|
|
|
|
|
|
| 64 |
| `localvqe-v1.3-4.8M-f32.gguf` | 19 MB | v1.3 joint β GGUF the engine loads |
|
| 65 |
| `localvqe-v1.3-4.8M.pt` | 55 MB | v1.3 joint β PyTorch checkpoint (research) |
|
| 66 |
| `localvqe-v1.2-1.3M-f32.gguf` | 5 MB | v1.2 joint β GGUF |
|
|
@@ -176,6 +207,21 @@ button produces APA / BibTeX), and the upstream DeepVQE paper:
|
|
| 176 |
}
|
| 177 |
```
|
| 178 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 179 |
## Dataset attribution
|
| 180 |
|
| 181 |
Weights are trained on the
|
|
|
|
| 54 |
- `bf16` GGUFs are ~12 % smaller with identical quality and speed; pick `f32`
|
| 55 |
unless download size matters.
|
| 56 |
|
| 57 |
+
### Compact line β GTCRN-AEC (for lower-power CPUs)
|
| 58 |
+
|
| 59 |
+
A separate, much smaller second line of models for lower-power CPUs: a
|
| 60 |
+
~49 K-parameter **GTCRN-AEC** network β a distinct architecture based on
|
| 61 |
+
[GTCRN](https://github.com/Xiaobin-Rong/gtcrn) (Rong et al., ICASSP 2024) β
|
| 62 |
+
paired with the project's DSP echo-cancellation front-end. The GGUFs are
|
| 63 |
+
self-contained, so they run with the same single command as every other model.
|
| 64 |
+
Two variants share the architecture:
|
| 65 |
+
|
| 66 |
+
| Model | Does | Params |
|
| 67 |
+
|---|---|---:|
|
| 68 |
+
| **localvqe-pi-v1-49k** | AEC + NS + dereverb (full enhance) | 49 K |
|
| 69 |
+
| **localvqe-pi-aec-v1-49k** | echo only β keeps noise + room | 49 K |
|
| 70 |
+
|
| 71 |
+
Whole-clip real-time factor on the real ggml graph, benchmarked on a Raspberry
|
| 72 |
+
Pi 5 (one example of a low-power target; `test_gtcrn --bench`, Cortex-A76,
|
| 73 |
+
Ubuntu 24.04), parity-verified to the PyTorch reference within ~1e-6 on-device.
|
| 74 |
+
RTF is identical for both variants:
|
| 75 |
+
|
| 76 |
+
| Threads | 8 s clip | RTF | RT factor |
|
| 77 |
+
|--:|--:|--:|--:|
|
| 78 |
+
| 1 | 388 ms | 0.048 | ~21Γ |
|
| 79 |
+
| 2 | 219 ms | 0.027 | ~37Γ |
|
| 80 |
+
| 4 | 163 ms | 0.020 | ~49Γ |
|
| 81 |
+
|
| 82 |
+
That is ~0.78 ms per 16 ms hop single-threaded. Runs on any CPU; for single-board
|
| 83 |
+
ARM, cross-compile for aarch64 with `ggml/docker/Dockerfile.arm64` (docker buildx
|
| 84 |
+
+ qemu). `f16`/`q8` quantizations are published only if/when released.
|
| 85 |
+
|
| 86 |
## Files in this repository
|
| 87 |
|
| 88 |
| File | Size | Model |
|
|
|
|
| 90 |
| `localvqe-v1.4-aec-200K-f32.gguf` | 3 MB | v1.4-AEC (echo only) |
|
| 91 |
| `localvqe-v1.4-aec-200K-bf16.gguf` | 2.6 MB | v1.4-AEC, conv weights in BF16 |
|
| 92 |
| `localvqe-v1.4-aec-2.7K-f32.gguf` | 17 KB | v1.4-AEC front-end only (adaptive filter, no mask) |
|
| 93 |
+
| `localvqe-pi-v1-49k-f32.gguf` | 2.3 MB | Compact line β GTCRN-AEC full enhance (echo + NS + dereverb) |
|
| 94 |
+
| `localvqe-pi-aec-v1-49k-f32.gguf` | 2.3 MB | Compact line β GTCRN-AEC echo-only (keeps noise + room) |
|
| 95 |
| `localvqe-v1.3-4.8M-f32.gguf` | 19 MB | v1.3 joint β GGUF the engine loads |
|
| 96 |
| `localvqe-v1.3-4.8M.pt` | 55 MB | v1.3 joint β PyTorch checkpoint (research) |
|
| 97 |
| `localvqe-v1.2-1.3M-f32.gguf` | 5 MB | v1.2 joint β GGUF |
|
|
|
|
| 207 |
}
|
| 208 |
```
|
| 209 |
|
| 210 |
+
The compact GTCRN-AEC line is based on **GTCRN** β please also cite:
|
| 211 |
+
|
| 212 |
+
```bibtex
|
| 213 |
+
@inproceedings{rong2024gtcrn,
|
| 214 |
+
title = {GTCRN: A Speech Enhancement Model Requiring Ultralow
|
| 215 |
+
Computational Resources},
|
| 216 |
+
author = {Rong, Xiaobin and Sun, Tianchi and Zhang, Xu and Hu, Yuxiang
|
| 217 |
+
and Zhu, Changbao and Lu, Jing},
|
| 218 |
+
booktitle = {ICASSP 2024 - 2024 IEEE International Conference on Acoustics,
|
| 219 |
+
Speech and Signal Processing (ICASSP)},
|
| 220 |
+
pages = {971--975}, year = {2024},
|
| 221 |
+
doi = {10.1109/ICASSP48485.2024.10448310}
|
| 222 |
+
}
|
| 223 |
+
```
|
| 224 |
+
|
| 225 |
## Dataset attribution
|
| 226 |
|
| 227 |
Weights are trained on the
|