Commit ·
ca66821
0
Parent(s):
Duplicate from LocalAI-io/LocalVQE
Browse filesCo-authored-by: Richard Palethorpe <richiejp@users.noreply.huggingface.co>
- .gitattributes +41 -0
- README.md +295 -0
- localvqe-technical-report.pdf +3 -0
- localvqe-v1-1.3M-f32.gguf +3 -0
- localvqe-v1-1.3M.pt +3 -0
- localvqe-v1.1-1.3M-f32.gguf +3 -0
- localvqe-v1.1-1.3M.pt +3 -0
.gitattributes
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
localvqe-baseline.gguf filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
localvqe-finetune.gguf filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
localvqe-v1-f32.gguf filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
localvqe-v1-1.3M-f32.gguf filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
localvqe-technical-report.pdf filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
localvqe-v1.1-1.3M-f32.gguf filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,295 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: pytorch
|
| 3 |
+
tags:
|
| 4 |
+
- audio-to-audio
|
| 5 |
+
- speech-enhancement
|
| 6 |
+
- acoustic-echo-cancellation
|
| 7 |
+
- noise-suppression
|
| 8 |
+
- ggml
|
| 9 |
+
license: apache-2.0
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# LocalVQE
|
| 13 |
+
|
| 14 |
+
[](https://huggingface.co/spaces/LocalAI-io/LocalVQE-demo)
|
| 15 |
+
[](https://github.com/localai-org/LocalVQE)
|
| 16 |
+
[](https://www.apache.org/licenses/LICENSE-2.0)
|
| 17 |
+
|
| 18 |
+
**Local Voice Quality Enhancement** — a compact neural model for joint
|
| 19 |
+
acoustic echo cancellation (AEC), noise suppression, and dereverberation of
|
| 20 |
+
16 kHz speech, designed to run on commodity CPUs in real time.
|
| 21 |
+
|
| 22 |
+
- 1.3 M parameters (~5 MB F32)
|
| 23 |
+
- ~1.66 ms per 16 ms frame on Zen4 (24 threads) — **≈9.6× realtime**
|
| 24 |
+
- Causal, streaming: 256-sample hop, 16 ms algorithmic latency
|
| 25 |
+
- F32 reference inference in C++ via [GGML](https://github.com/ggml-org/ggml);
|
| 26 |
+
PyTorch reference included for verification and research
|
| 27 |
+
|
| 28 |
+
Try it live: <https://huggingface.co/spaces/LocalAI-io/LocalVQE-demo>.
|
| 29 |
+
|
| 30 |
+
This page is the Hugging Face model card — it hosts the published weights.
|
| 31 |
+
Source code, build system, tests, and training pipeline live in the GitHub
|
| 32 |
+
repository: <https://github.com/localai-org/LocalVQE>.
|
| 33 |
+
|
| 34 |
+
The current release is **v1.1**, which fixes intermittent crackling the
|
| 35 |
+
previous release produced under heavy background noise.
|
| 36 |
+
|
| 37 |
+
The technical report describing the architecture, streaming-state contract,
|
| 38 |
+
and streaming-causal normalisation operator is included in this repo as
|
| 39 |
+
[`localvqe-technical-report.pdf`](localvqe-technical-report.pdf). We would
|
| 40 |
+
like to publish it to arXiv (`eess.AS` / `cs.SD`) but need an endorsement
|
| 41 |
+
from an existing author in those categories — if you can endorse, please
|
| 42 |
+
reach out via the GitHub repo.
|
| 43 |
+
|
| 44 |
+
**Authors:**
|
| 45 |
+
- Richard Palethorpe ([richiejp](https://github.com/richiejp))
|
| 46 |
+
- Claude (Anthropic)
|
| 47 |
+
|
| 48 |
+
LocalVQE is a derivative of **DeepVQE** (Indenbom et al., Interspeech 2023 —
|
| 49 |
+
*DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic Echo
|
| 50 |
+
Cancellation, Noise Suppression and Dereverberation*,
|
| 51 |
+
[arXiv:2306.03177](https://arxiv.org/abs/2306.03177)) — smaller, GGML-native,
|
| 52 |
+
and tuned for streaming CPU inference. The architecture is documented in
|
| 53 |
+
the technical report linked above.
|
| 54 |
+
|
| 55 |
+
## A concrete example
|
| 56 |
+
|
| 57 |
+
Picture a video call from a laptop. Your microphone picks up three things
|
| 58 |
+
alongside your voice:
|
| 59 |
+
|
| 60 |
+
1. The remote participant's voice, played back through your speakers and
|
| 61 |
+
caught again by your mic — this is the **echo**. Without cancellation
|
| 62 |
+
they hear themselves a fraction of a second later.
|
| 63 |
+
2. Your own voice bouncing off walls, desk, and monitor before reaching
|
| 64 |
+
the mic — this is **reverberation**, the "tunnel" or "bathroom" sound
|
| 65 |
+
that makes you feel far away from the listener.
|
| 66 |
+
3. A fan, keyboard clatter, a dog barking, or traffic outside — plain
|
| 67 |
+
**background noise**.
|
| 68 |
+
|
| 69 |
+
LocalVQE removes all three in a single causal pass, frame by frame, on
|
| 70 |
+
the CPU, so only your voice reaches the far end.
|
| 71 |
+
|
| 72 |
+
## Why this, and not a classical AEC/NS stack?
|
| 73 |
+
|
| 74 |
+
Hand-tuned DSP pipelines (NLMS/AP/Kalman AEC, Wiener/spectral-subtraction
|
| 75 |
+
NS, MCRA noise tracking, RLS dereverb) can run in tens of microseconds per
|
| 76 |
+
frame and remain a strong baseline when the acoustic path is benign. LocalVQE
|
| 77 |
+
is interesting when you want:
|
| 78 |
+
|
| 79 |
+
- **Robustness to non-linear echo paths** (small loudspeakers, handheld
|
| 80 |
+
devices, plastic laptop chassis) where linear AEC leaves residual echo.
|
| 81 |
+
- **Non-stationary noise suppression** (babble, keyboards, fans changing
|
| 82 |
+
speed) that energy-based noise estimators struggle with.
|
| 83 |
+
- **One model, many conditions** — no per-device tuning of step sizes,
|
| 84 |
+
forgetting factors, or VAD thresholds.
|
| 85 |
+
- **A single deterministic causal pass** — no double-talk detector, no
|
| 86 |
+
adaptation state that can diverge.
|
| 87 |
+
|
| 88 |
+
The trade-off is CPU: a classical stack might cost ~0.1 ms/frame, LocalVQE
|
| 89 |
+
~1–2 ms/frame. On anything larger than a microcontroller that's still a
|
| 90 |
+
small fraction of a real-time budget.
|
| 91 |
+
|
| 92 |
+
## Why this, and not DeepVQE?
|
| 93 |
+
|
| 94 |
+
Microsoft never released DeepVQE — no weights, no reference
|
| 95 |
+
implementation, no streaming runtime. We re-implemented it from the
|
| 96 |
+
paper as a GGML graph at
|
| 97 |
+
[richiejp/deepvqe-ggml](https://github.com/richiejp/deepvqe-ggml)
|
| 98 |
+
(the full-width ~7.5 M-parameter version) before starting LocalVQE.
|
| 99 |
+
LocalVQE is the same idea pruned and rebuilt to ~1.3 M parameters
|
| 100 |
+
(~5 MB F32), small enough to run on commodity CPUs in real time.
|
| 101 |
+
|
| 102 |
+
## Files in this repository
|
| 103 |
+
|
| 104 |
+
| File | Size | Description |
|
| 105 |
+
|---|---|---|
|
| 106 |
+
| `localvqe-v1.1-1.3M.pt` | 11 MB | PyTorch checkpoint — DNS5 pre-training + ICASSP 2022/2023 AEC Challenge fine-tune. |
|
| 107 |
+
| `localvqe-v1.1-1.3M-f32.gguf` | 5 MB | GGML F32 export — what the C++ inference engine loads. |
|
| 108 |
+
|
| 109 |
+
Only F32 GGUF is published today. A `quantize` tool is included in the
|
| 110 |
+
C++ build (see below); calibrated Q4_K / Q8_0 weights have not yet been
|
| 111 |
+
released.
|
| 112 |
+
|
| 113 |
+
## Validation Results
|
| 114 |
+
|
| 115 |
+
Full 800-clip eval on the
|
| 116 |
+
[ICASSP 2022 AEC Challenge blind test set](https://github.com/microsoft/AEC-Challenge)
|
| 117 |
+
— real recordings, not synthetic mixes.
|
| 118 |
+
|
| 119 |
+
| Scenario | n | AECMOS echo ↑ | AECMOS deg ↑ | blind ERLE ↑ | DNSMOS OVRL ↑ |
|
| 120 |
+
|-----------------------------------|----:|--------------:|-------------:|-------------:|--------------:|
|
| 121 |
+
| doubletalk | 115 | 4.70 | 2.35 | 8.4 dB | 2.85 |
|
| 122 |
+
| doubletalk-with-movement | 185 | 4.63 | 2.35 | 8.3 dB | 2.80 |
|
| 123 |
+
| farend-singletalk | 107 | 2.98 | 4.91 | 44.7 dB | 1.93 |
|
| 124 |
+
| farend-singletalk-with-movement | 193 | 3.40 | 4.95 | 45.0 dB | 1.91 |
|
| 125 |
+
| nearend-singletalk | 200 | 4.99 | 4.05 | 2.5 dB | 3.13 |
|
| 126 |
+
|
| 127 |
+
- **AECMOS** (Purin et al., ICASSP 2022) is Microsoft's non-intrusive AEC
|
| 128 |
+
quality predictor. "Echo" rates how well echo was removed; "degradation"
|
| 129 |
+
rates how clean the resulting speech is. 1–5 MOS scale, higher is better.
|
| 130 |
+
- **Blind ERLE** is `10·log10(E[mic²] / E[enh²])`. Only meaningful on
|
| 131 |
+
far-end single-talk where the input is echo-only; on scenes with active
|
| 132 |
+
near-end speech it understates echo removal because both numerator and
|
| 133 |
+
denominator are dominated by speech.
|
| 134 |
+
|
| 135 |
+
## Building the C++ Inference Engine
|
| 136 |
+
|
| 137 |
+
Source, build system, and tests live at
|
| 138 |
+
<https://github.com/localai-org/LocalVQE>. Requires CMake ≥ 3.20 and a C++17
|
| 139 |
+
compiler. A [Nix](https://nixos.org/) flake is provided:
|
| 140 |
+
|
| 141 |
+
```bash
|
| 142 |
+
git clone --recursive https://github.com/localai-org/LocalVQE.git
|
| 143 |
+
cd LocalVQE
|
| 144 |
+
|
| 145 |
+
# With Nix:
|
| 146 |
+
nix develop
|
| 147 |
+
cmake -S ggml -B ggml/build -DCMAKE_BUILD_TYPE=Release
|
| 148 |
+
cmake --build ggml/build -j$(nproc)
|
| 149 |
+
|
| 150 |
+
# Without Nix — install cmake, gcc/clang, pkg-config, libsndfile, then:
|
| 151 |
+
cmake -S ggml -B ggml/build -DCMAKE_BUILD_TYPE=Release
|
| 152 |
+
cmake --build ggml/build -j$(nproc)
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
+
Binaries land in `ggml/build/bin/`. The CPU build produces multiple
|
| 156 |
+
`libggml-cpu-*.so` variants (SSE4.2 / AVX2 / AVX-512) selected at runtime.
|
| 157 |
+
Keep the binaries and `.so` files together.
|
| 158 |
+
|
| 159 |
+
### Vulkan backend (embedded / integrated-GPU targets)
|
| 160 |
+
|
| 161 |
+
Add `-DLOCALVQE_VULKAN=ON` to the configure step. This composes with the
|
| 162 |
+
CPU build — an additional `libggml-vulkan.so` is produced in
|
| 163 |
+
`ggml/build/bin/` and the runtime loader picks it up when a Vulkan ICD is
|
| 164 |
+
present, otherwise it falls back to the CPU variants.
|
| 165 |
+
|
| 166 |
+
```bash
|
| 167 |
+
cmake -S ggml -B ggml/build -DCMAKE_BUILD_TYPE=Release -DLOCALVQE_VULKAN=ON
|
| 168 |
+
cmake --build ggml/build -j$(nproc)
|
| 169 |
+
```
|
| 170 |
+
|
| 171 |
+
The Nix flake's dev shell already includes `vulkan-loader`,
|
| 172 |
+
`vulkan-headers`, and `shaderc`. Without Nix, install the equivalents
|
| 173 |
+
from your distro (Debian: `libvulkan-dev vulkan-headers
|
| 174 |
+
glslc`/`shaderc`).
|
| 175 |
+
|
| 176 |
+
### Streaming latency (per-hop, 16 kHz / 256-sample hop → 16 ms budget)
|
| 177 |
+
|
| 178 |
+
Measured with `bench` on Zen4 desktop (Ryzen 9 7900). Each hop is a
|
| 179 |
+
full `ggml_backend_graph_compute`.
|
| 180 |
+
|
| 181 |
+
| Backend | Threads | p50 | p99 | max |
|
| 182 |
+
|-----------------------------|--------:|--------:|--------:|--------:|
|
| 183 |
+
| CPU | 1 | 3.40 ms | 3.57 ms | 5.06 ms |
|
| 184 |
+
| CPU | 2 | 2.07 ms | 2.25 ms | 3.65 ms |
|
| 185 |
+
| CPU | 4 | 1.32 ms | 1.57 ms | 6.91 ms |
|
| 186 |
+
| Vulkan — AMD iGPU (RADV) | — | 4.43 ms | 4.62 ms | 5.07 ms |
|
| 187 |
+
| Vulkan — NVIDIA RTX 5070 Ti | — | 1.79 ms | 3.41 ms | 4.14 ms |
|
| 188 |
+
|
| 189 |
+
Vulkan p50/p95/p99 are tight, but worst-case single-hop latency on a
|
| 190 |
+
shared desktop is sensitive to external GPU clients (display
|
| 191 |
+
compositor, browser). On a dedicated embedded device with no
|
| 192 |
+
compositor contending for the queue, expect the quieter end of the
|
| 193 |
+
range.
|
| 194 |
+
|
| 195 |
+
## Running Inference
|
| 196 |
+
|
| 197 |
+
Download `localvqe-v1.1-1.3M-f32.gguf` from this repository (the file list above)
|
| 198 |
+
either via `huggingface-cli`, the Hub web UI, or `hf_hub_download` from
|
| 199 |
+
`huggingface_hub`. Then:
|
| 200 |
+
|
| 201 |
+
### CLI
|
| 202 |
+
|
| 203 |
+
```bash
|
| 204 |
+
./ggml/build/bin/localvqe localvqe-v1.1-1.3M-f32.gguf \
|
| 205 |
+
--in-wav mic.wav ref.wav \
|
| 206 |
+
--out-wav enhanced.wav
|
| 207 |
+
```
|
| 208 |
+
|
| 209 |
+
Expects 16 kHz mono PCM for both mic and far-end reference.
|
| 210 |
+
|
| 211 |
+
### Benchmark
|
| 212 |
+
|
| 213 |
+
```bash
|
| 214 |
+
./ggml/build/bin/bench localvqe-v1.1-1.3M-f32.gguf \
|
| 215 |
+
--in-wav mic.wav ref.wav --iters 10 --profile
|
| 216 |
+
```
|
| 217 |
+
|
| 218 |
+
### Shared Library (C API)
|
| 219 |
+
|
| 220 |
+
```bash
|
| 221 |
+
cmake -S ggml -B ggml/build -DLOCALVQE_BUILD_SHARED=ON
|
| 222 |
+
cmake --build ggml/build -j$(nproc)
|
| 223 |
+
```
|
| 224 |
+
|
| 225 |
+
Produces `liblocalvqe.so` with the API in `ggml/localvqe_api.h`. See
|
| 226 |
+
`ggml/example_purego_test.go` in the GitHub repo for a Go / `purego`
|
| 227 |
+
integration.
|
| 228 |
+
|
| 229 |
+
### Quantizing (experimental)
|
| 230 |
+
|
| 231 |
+
Calibrated Q4_K / Q8_0 weights are not yet published. The `quantize`
|
| 232 |
+
tool in the C++ build can produce GGUF variants from the F32 reference
|
| 233 |
+
for experimentation:
|
| 234 |
+
|
| 235 |
+
```bash
|
| 236 |
+
./ggml/build/bin/quantize localvqe-v1.1-1.3M-f32.gguf localvqe-v1.1-1.3M-q8.gguf Q8_0
|
| 237 |
+
```
|
| 238 |
+
|
| 239 |
+
Expect end-to-end quality loss until proper per-tensor selection and
|
| 240 |
+
calibration have been worked through.
|
| 241 |
+
|
| 242 |
+
## PyTorch Reference
|
| 243 |
+
|
| 244 |
+
`localvqe-v1.1-1.3M.pt` is the PyTorch checkpoint used to produce the GGUF export.
|
| 245 |
+
It is provided for verification, ablation, and downstream research — not
|
| 246 |
+
for end-user inference, which should go through the GGML build above. The
|
| 247 |
+
model definition lives under `pytorch/` in the
|
| 248 |
+
[GitHub repo](https://github.com/localai-org/LocalVQE):
|
| 249 |
+
|
| 250 |
+
```bash
|
| 251 |
+
git clone https://github.com/localai-org/LocalVQE.git
|
| 252 |
+
cd LocalVQE/pytorch
|
| 253 |
+
pip install -r requirements.txt
|
| 254 |
+
```
|
| 255 |
+
|
| 256 |
+
## Citing LocalVQE
|
| 257 |
+
|
| 258 |
+
If you use LocalVQE in academic work, please cite the repository via the
|
| 259 |
+
`CITATION.cff` at <https://github.com/localai-org/LocalVQE> — GitHub renders
|
| 260 |
+
a "Cite this repository" button that produces APA and BibTeX entries
|
| 261 |
+
automatically.
|
| 262 |
+
|
| 263 |
+
For a DOI, we recommend citing a specific release via
|
| 264 |
+
[Zenodo](https://zenodo.org), which mints a DOI per GitHub release. Please
|
| 265 |
+
also cite the upstream DeepVQE paper:
|
| 266 |
+
|
| 267 |
+
```bibtex
|
| 268 |
+
@inproceedings{indenbom2023deepvqe,
|
| 269 |
+
title = {DeepVQE: Real Time Deep Voice Quality Enhancement for Joint
|
| 270 |
+
Acoustic Echo Cancellation, Noise Suppression and Dereverberation},
|
| 271 |
+
author = {Indenbom, Evgenii and Beltr{\'a}n, Nicolae-C{\u{a}}t{\u{a}}lin
|
| 272 |
+
and Chernov, Mykola and Aichner, Robert},
|
| 273 |
+
booktitle = {Interspeech},
|
| 274 |
+
year = {2023},
|
| 275 |
+
doi = {10.21437/Interspeech.2023-2176}
|
| 276 |
+
}
|
| 277 |
+
```
|
| 278 |
+
|
| 279 |
+
## Dataset Attribution
|
| 280 |
+
|
| 281 |
+
Published weights are trained on data from the
|
| 282 |
+
[ICASSP 2023 Deep Noise Suppression Challenge](https://github.com/microsoft/DNS-Challenge)
|
| 283 |
+
(Microsoft, CC BY 4.0) and fine-tuned on the
|
| 284 |
+
[ICASSP 2022/2023 Acoustic Echo Cancellation Challenge](https://github.com/microsoft/AEC-Challenge).
|
| 285 |
+
|
| 286 |
+
## Safety Note
|
| 287 |
+
|
| 288 |
+
Training data was filtered by DNSMOS perceived-quality scores, which can
|
| 289 |
+
misclassify distressed speech (screaming, crying) as noise. LocalVQE may
|
| 290 |
+
attenuate or distort such signals and must not be relied upon for emergency
|
| 291 |
+
call or safety-critical applications.
|
| 292 |
+
|
| 293 |
+
## License
|
| 294 |
+
|
| 295 |
+
Apache License 2.0.
|
localvqe-technical-report.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:351d69f2f10bc775f77f5a034c196fe94c57634e27403412bec7d417ccdcb468
|
| 3 |
+
size 365911
|
localvqe-v1-1.3M-f32.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d5eaf577449d0f920d8ee5e1042b8ddc7b6627313a042c62e2ada1b42719ab30
|
| 3 |
+
size 5162720
|
localvqe-v1-1.3M.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:499d7cadfe939c2f7849ce2926c791de97c10f084fbfd8243794d199a0d54f8a
|
| 3 |
+
size 11656320
|
localvqe-v1.1-1.3M-f32.gguf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c118227c6b433d6aa36d9e4b993e0f31aa60787ea38d301d04db917a4a2b0a84
|
| 3 |
+
size 5173088
|
localvqe-v1.1-1.3M.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:76aabaa3bca3a9d7989463226312aa2344f978403c3e0e007e58a15922c97707
|
| 3 |
+
size 11453482
|