Replace FP16 with INT8 k-means palettization (2.2 MB, -48% size, PESQ/STOI identical)
Browse files
DeepFilterNet3.mlmodelc/analytics/coremldata.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 243
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6dc4207ced0fda3bd22928770b2a775b36cdf80938eb27074807121824a446cf
|
| 3 |
size 243
|
DeepFilterNet3.mlmodelc/coremldata.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 415
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b1b2798fa27abb01f31871914ba956c2c098235d4af14c41336278fab70492ca
|
| 3 |
size 415
|
DeepFilterNet3.mlmodelc/model.mil
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
DeepFilterNet3.mlmodelc/weights/weight.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6b6adf3f78972bd1caed73e9e9e4eabd65590dcc9a86e41ae00e22b7c63a0018
|
| 3 |
+
size 2181056
|
README.md
CHANGED
|
@@ -1,32 +1,67 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
tags:
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
# DeepFilterNet3 —
|
| 13 |
|
| 14 |
-
Real-time speech enhancement
|
|
|
|
| 15 |
|
| 16 |
-
- **2.1M params**,
|
| 17 |
-
-
|
| 18 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
## Latency (M2 Max)
|
| 21 |
|
| 22 |
| Duration | Time | RTF |
|
| 23 |
|----------|------|-----|
|
| 24 |
-
|
|
| 25 |
-
|
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
## Usage
|
| 29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
```swift
|
| 31 |
import SpeechEnhancement
|
| 32 |
|
|
@@ -34,24 +69,28 @@ let enhancer = try await SpeechEnhancer.fromPretrained()
|
|
| 34 |
let clean = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
|
| 35 |
```
|
| 36 |
|
|
|
|
|
|
|
| 37 |
```bash
|
| 38 |
swift run audio denoise noisy.wav --output clean.wav
|
| 39 |
```
|
| 40 |
|
| 41 |
-
##
|
| 42 |
|
| 43 |
-
-
|
| 44 |
-
- `auxiliary.npz` — ERB filterbank, Vorbis window, normalization states
|
| 45 |
|
| 46 |
-
##
|
| 47 |
|
| 48 |
-
-
|
| 49 |
-
-
|
| 50 |
|
| 51 |
-
|
| 52 |
|
| 53 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
-
-
|
| 56 |
-
- **Docs**: [soniqo.audio](https://soniqo.audio)
|
| 57 |
-
- **GitHub**: [soniqo/speech-swift](https://github.com/soniqo/speech-swift)
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
tags:
|
| 4 |
+
- speech-enhancement
|
| 5 |
+
- denoising
|
| 6 |
+
- coreml
|
| 7 |
+
- apple-silicon
|
| 8 |
+
- deepfilternet
|
| 9 |
+
- int8
|
| 10 |
+
- palettization
|
| 11 |
+
base_model: Rikorose/DeepFilterNet3
|
| 12 |
+
library_name: coreml
|
| 13 |
+
pipeline_tag: audio-to-audio
|
| 14 |
---
|
| 15 |
|
| 16 |
+
# DeepFilterNet3 — CoreML INT8
|
| 17 |
|
| 18 |
+
Real-time speech enhancement for Apple Silicon. Removes background noise
|
| 19 |
+
from speech audio. Runs on **Neural Engine** via CoreML.
|
| 20 |
|
| 21 |
+
- **2.1M params**, INT8 k-means palettization, **2.2 MB**
|
| 22 |
+
- 48 kHz native, 10 ms frames
|
| 23 |
+
- Requires macOS 14+ / iOS 17+
|
| 24 |
+
|
| 25 |
+
## Quality
|
| 26 |
+
|
| 27 |
+
Measured on 30 VoiceBank-DEMAND test clips via Python `CoreMLBackend`
|
| 28 |
+
(replaces only the NN forward; keeps the PyTorch STFT / ERB / deep-filter
|
| 29 |
+
post-processing intact).
|
| 30 |
+
|
| 31 |
+
| Variant | PESQ | STOI | SI-SDR | Size |
|
| 32 |
+
|---------|------|------|--------|------|
|
| 33 |
+
| PyTorch FP32 (reference) | 2.900 | 0.947 | 18.19 | — |
|
| 34 |
+
| CoreML FP16 | 2.901 | 0.947 | 18.19 | 4.2 MB |
|
| 35 |
+
| **CoreML INT8 (this repo)** | **2.907** | **0.947** | **18.11** | **2.2 MB** |
|
| 36 |
+
|
| 37 |
+
INT8 matches FP16 within run-to-run noise (ΔPESQ +0.006, ΔSI-SDR
|
| 38 |
+
−0.07 dB, STOI identical) while cutting size by 48%.
|
| 39 |
|
| 40 |
## Latency (M2 Max)
|
| 41 |
|
| 42 |
| Duration | Time | RTF |
|
| 43 |
|----------|------|-----|
|
| 44 |
+
| 5 s | 0.65 s | 0.13 |
|
| 45 |
+
| 10 s | 1.2 s | 0.12 |
|
| 46 |
+
| 20 s | 4.8 s | 0.24 |
|
| 47 |
+
|
| 48 |
+
## Files
|
| 49 |
+
|
| 50 |
+
| File | Size | Description |
|
| 51 |
+
|------|------|-------------|
|
| 52 |
+
| `DeepFilterNet3.mlmodelc` | 2.2 MB | Pre-compiled CoreML model (runs on Neural Engine) |
|
| 53 |
+
| `auxiliary.npz` | 126 KB | ERB filterbank, Vorbis window, normalization states |
|
| 54 |
|
| 55 |
## Usage
|
| 56 |
|
| 57 |
+
Add [speech-swift](https://github.com/soniqo/speech-swift) to `Package.swift`:
|
| 58 |
+
|
| 59 |
+
```swift
|
| 60 |
+
.package(url: "https://github.com/soniqo/speech-swift", branch: "main")
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
Then denoise:
|
| 64 |
+
|
| 65 |
```swift
|
| 66 |
import SpeechEnhancement
|
| 67 |
|
|
|
|
| 69 |
let clean = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
|
| 70 |
```
|
| 71 |
|
| 72 |
+
CLI:
|
| 73 |
+
|
| 74 |
```bash
|
| 75 |
swift run audio denoise noisy.wav --output clean.wav
|
| 76 |
```
|
| 77 |
|
| 78 |
+
## Source
|
| 79 |
|
| 80 |
+
- Base model: [Rikorose/DeepFilterNet3](https://github.com/Rikorose/DeepFilterNet) (Apache-2.0)
|
|
|
|
| 81 |
|
| 82 |
+
## License
|
| 83 |
|
| 84 |
+
- Model weights: Apache-2.0 / MIT dual license
|
| 85 |
+
- CoreML conversion: Apache-2.0
|
| 86 |
|
| 87 |
+
## Links
|
| 88 |
|
| 89 |
+
- [speech-swift](https://github.com/soniqo/speech-swift) — Apple SDK
|
| 90 |
+
- [soniqo.audio](https://soniqo.audio) — website
|
| 91 |
+
- [MLX vs CoreML on Apple Silicon — a practical guide](https://blog.ivan.digital/mlx-vs-coreml-on-apple-silicon-a-practical-guide-to-picking-the-right-backend-and-why-you-should-f77ddea7b27a) — related blog post
|
| 92 |
+
- [soniqo.audio/blog](https://soniqo.audio/blog) — blog
|
| 93 |
+
|
| 94 |
+
## Reference
|
| 95 |
|
| 96 |
+
- [DeepFilterNet3 paper](https://arxiv.org/abs/2305.08227)
|
|
|
|
|
|