aufklarer commited on
Commit
1af4db9
·
verified ·
1 Parent(s): fd536f3

Replace FP16 with INT8 k-means palettization (2.2 MB, -48% size, PESQ/STOI identical)

Browse files
DeepFilterNet3.mlmodelc/analytics/coremldata.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:095ae05f10e03fec7c89a18291c67c8ec1bae5293c78c4ab1f77892807d61328
3
  size 243
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6dc4207ced0fda3bd22928770b2a775b36cdf80938eb27074807121824a446cf
3
  size 243
DeepFilterNet3.mlmodelc/coremldata.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6588b0dccdde284302ac7234bc8e065b2c5050f6c4e24a90fd1ad93e38a84dbb
3
  size 415
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1b2798fa27abb01f31871914ba956c2c098235d4af14c41336278fab70492ca
3
  size 415
DeepFilterNet3.mlmodelc/model.mil CHANGED
The diff for this file is too large to render. See raw diff
 
DeepFilterNet3.mlmodelc/weights/weight.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:89bba737ffa6659db26c663d0dd828a5043cf575ab6ce5379c93304e1e19174e
3
- size 4275264
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b6adf3f78972bd1caed73e9e9e4eabd65590dcc9a86e41ae00e22b7c63a0018
3
+ size 2181056
README.md CHANGED
@@ -1,32 +1,67 @@
1
  ---
2
  license: apache-2.0
3
  tags:
4
- - speech-enhancement
5
- - denoising
6
- - coreml
7
- - apple-silicon
8
- - deepfilternet
9
- library_name: speech-swift
 
 
 
 
10
  ---
11
 
12
- # DeepFilterNet3 — Core ML (FP16)
13
 
14
- Real-time speech enhancement model for Apple Silicon. Removes background noise from speech audio.
 
15
 
16
- - **2.1M params**, FP16, ~4.2 MB
17
- - Runs on **Neural Engine** via Core ML
18
- - 48kHz native, 10ms frames
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ## Latency (M2 Max)
21
 
22
  | Duration | Time | RTF |
23
  |----------|------|-----|
24
- | 5s | 0.65s | 0.13 |
25
- | 10s | 1.2s | 0.12 |
26
- | 20s | 4.8s | 0.24 |
 
 
 
 
 
 
 
27
 
28
  ## Usage
29
 
 
 
 
 
 
 
 
 
30
  ```swift
31
  import SpeechEnhancement
32
 
@@ -34,24 +69,28 @@ let enhancer = try await SpeechEnhancer.fromPretrained()
34
  let clean = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
35
  ```
36
 
 
 
37
  ```bash
38
  swift run audio denoise noisy.wav --output clean.wav
39
  ```
40
 
41
- ## Files
42
 
43
- - `DeepFilterNet3.mlpackage` — Core ML FP16 model (Neural Engine)
44
- - `auxiliary.npz` — ERB filterbank, Vorbis window, normalization states
45
 
46
- ## Reference
47
 
48
- - [DeepFilterNet3](https://arxiv.org/abs/2305.08227)
49
- - Part of [speech-swift](https://github.com/soniqo/speech-swift)
50
 
51
- ---
52
 
53
- ---
 
 
 
 
 
54
 
55
- - **Guide**: [soniqo.audio/guides/denoise](https://soniqo.audio/guides/denoise)
56
- - **Docs**: [soniqo.audio](https://soniqo.audio)
57
- - **GitHub**: [soniqo/speech-swift](https://github.com/soniqo/speech-swift)
 
1
  ---
2
  license: apache-2.0
3
  tags:
4
+ - speech-enhancement
5
+ - denoising
6
+ - coreml
7
+ - apple-silicon
8
+ - deepfilternet
9
+ - int8
10
+ - palettization
11
+ base_model: Rikorose/DeepFilterNet3
12
+ library_name: coreml
13
+ pipeline_tag: audio-to-audio
14
  ---
15
 
16
+ # DeepFilterNet3 — CoreML INT8
17
 
18
+ Real-time speech enhancement for Apple Silicon. Removes background noise
19
+ from speech audio. Runs on **Neural Engine** via CoreML.
20
 
21
+ - **2.1M params**, INT8 k-means palettization, **2.2 MB**
22
+ - 48 kHz native, 10 ms frames
23
+ - Requires macOS 14+ / iOS 17+
24
+
25
+ ## Quality
26
+
27
+ Measured on 30 VoiceBank-DEMAND test clips via Python `CoreMLBackend`
28
+ (replaces only the NN forward; keeps the PyTorch STFT / ERB / deep-filter
29
+ post-processing intact).
30
+
31
+ | Variant | PESQ | STOI | SI-SDR | Size |
32
+ |---------|------|------|--------|------|
33
+ | PyTorch FP32 (reference) | 2.900 | 0.947 | 18.19 | — |
34
+ | CoreML FP16 | 2.901 | 0.947 | 18.19 | 4.2 MB |
35
+ | **CoreML INT8 (this repo)** | **2.907** | **0.947** | **18.11** | **2.2 MB** |
36
+
37
+ INT8 matches FP16 within run-to-run noise (ΔPESQ +0.006, ΔSI-SDR
38
+ −0.07 dB, STOI identical) while cutting size by 48%.
39
 
40
  ## Latency (M2 Max)
41
 
42
  | Duration | Time | RTF |
43
  |----------|------|-----|
44
+ | 5 s | 0.65 s | 0.13 |
45
+ | 10 s | 1.2 s | 0.12 |
46
+ | 20 s | 4.8 s | 0.24 |
47
+
48
+ ## Files
49
+
50
+ | File | Size | Description |
51
+ |------|------|-------------|
52
+ | `DeepFilterNet3.mlmodelc` | 2.2 MB | Pre-compiled CoreML model (runs on Neural Engine) |
53
+ | `auxiliary.npz` | 126 KB | ERB filterbank, Vorbis window, normalization states |
54
 
55
  ## Usage
56
 
57
+ Add [speech-swift](https://github.com/soniqo/speech-swift) to `Package.swift`:
58
+
59
+ ```swift
60
+ .package(url: "https://github.com/soniqo/speech-swift", branch: "main")
61
+ ```
62
+
63
+ Then denoise:
64
+
65
  ```swift
66
  import SpeechEnhancement
67
 
 
69
  let clean = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
70
  ```
71
 
72
+ CLI:
73
+
74
  ```bash
75
  swift run audio denoise noisy.wav --output clean.wav
76
  ```
77
 
78
+ ## Source
79
 
80
+ - Base model: [Rikorose/DeepFilterNet3](https://github.com/Rikorose/DeepFilterNet) (Apache-2.0)
 
81
 
82
+ ## License
83
 
84
+ - Model weights: Apache-2.0 / MIT dual license
85
+ - CoreML conversion: Apache-2.0
86
 
87
+ ## Links
88
 
89
+ - [speech-swift](https://github.com/soniqo/speech-swift) — Apple SDK
90
+ - [soniqo.audio](https://soniqo.audio) — website
91
+ - [MLX vs CoreML on Apple Silicon — a practical guide](https://blog.ivan.digital/mlx-vs-coreml-on-apple-silicon-a-practical-guide-to-picking-the-right-backend-and-why-you-should-f77ddea7b27a) — related blog post
92
+ - [soniqo.audio/blog](https://soniqo.audio/blog) — blog
93
+
94
+ ## Reference
95
 
96
+ - [DeepFilterNet3 paper](https://arxiv.org/abs/2305.08227)