Replace FP16 with INT8 k-means palettization (2.2 MB, -48% size, PESQ/STOI identical)

Browse files

Files changed (5) hide show

DeepFilterNet3.mlmodelc/analytics/coremldata.bin +1 -1
DeepFilterNet3.mlmodelc/coremldata.bin +1 -1
DeepFilterNet3.mlmodelc/model.mil +0 -0
DeepFilterNet3.mlmodelc/weights/weight.bin +2 -2
README.md +64 -25

DeepFilterNet3.mlmodelc/analytics/coremldata.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:095ae05f10e03fec7c89a18291c67c8ec1bae5293c78c4ab1f77892807d61328
 size 243

 version https://git-lfs.github.com/spec/v1
+oid sha256:6dc4207ced0fda3bd22928770b2a775b36cdf80938eb27074807121824a446cf
 size 243

DeepFilterNet3.mlmodelc/coremldata.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6588b0dccdde284302ac7234bc8e065b2c5050f6c4e24a90fd1ad93e38a84dbb
 size 415

 version https://git-lfs.github.com/spec/v1
+oid sha256:b1b2798fa27abb01f31871914ba956c2c098235d4af14c41336278fab70492ca
 size 415

DeepFilterNet3.mlmodelc/model.mil CHANGED Viewed

The diff for this file is too large to render. See raw diff

DeepFilterNet3.mlmodelc/weights/weight.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:89bba737ffa6659db26c663d0dd828a5043cf575ab6ce5379c93304e1e19174e
-size 4275264

 version https://git-lfs.github.com/spec/v1
+oid sha256:6b6adf3f78972bd1caed73e9e9e4eabd65590dcc9a86e41ae00e22b7c63a0018
+size 2181056

README.md CHANGED Viewed

@@ -1,32 +1,67 @@
 ---
 license: apache-2.0
 tags:
-  - speech-enhancement
-  - denoising
-  - coreml
-  - apple-silicon
-  - deepfilternet
-library_name: speech-swift
 ---
-# DeepFilterNet3 — Core ML (FP16)
-Real-time speech enhancement model for Apple Silicon. Removes background noise from speech audio.
-- **2.1M params**, FP16, ~4.2 MB
-- Runs on **Neural Engine** via Core ML
-- 48kHz native, 10ms frames
 ## Latency (M2 Max)
 | Duration | Time | RTF |
 |----------|------|-----|
-| 5s | 0.65s | 0.13 |
-| 10s | 1.2s | 0.12 |
-| 20s | 4.8s | 0.24 |
 ## Usage
 ```swift
 import SpeechEnhancement
@@ -34,24 +69,28 @@ let enhancer = try await SpeechEnhancer.fromPretrained()
 let clean = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
 ```
 ```bash
 swift run audio denoise noisy.wav --output clean.wav
 ```
-## Files
-- `DeepFilterNet3.mlpackage` — Core ML FP16 model (Neural Engine)
-- `auxiliary.npz` — ERB filterbank, Vorbis window, normalization states
-## Reference
-- [DeepFilterNet3](https://arxiv.org/abs/2305.08227)
-- Part of [speech-swift](https://github.com/soniqo/speech-swift)
----
----
-- **Guide**: [soniqo.audio/guides/denoise](https://soniqo.audio/guides/denoise)
-- **Docs**: [soniqo.audio](https://soniqo.audio)
-- **GitHub**: [soniqo/speech-swift](https://github.com/soniqo/speech-swift)

 ---
 license: apache-2.0
 tags:
+- speech-enhancement
+- denoising
+- coreml
+- apple-silicon
+- deepfilternet
+- int8
+- palettization
+base_model: Rikorose/DeepFilterNet3
+library_name: coreml
+pipeline_tag: audio-to-audio
 ---
+# DeepFilterNet3 — CoreML INT8
+Real-time speech enhancement for Apple Silicon. Removes background noise
+from speech audio. Runs on **Neural Engine** via CoreML.
+- **2.1M params**, INT8 k-means palettization, **2.2 MB**
+- 48 kHz native, 10 ms frames
+- Requires macOS 14+ / iOS 17+
+## Quality
+Measured on 30 VoiceBank-DEMAND test clips via Python `CoreMLBackend`
+(replaces only the NN forward; keeps the PyTorch STFT / ERB / deep-filter
+post-processing intact).
+| Variant | PESQ | STOI | SI-SDR | Size |
+|---------|------|------|--------|------|
+| PyTorch FP32 (reference) | 2.900 | 0.947 | 18.19 | — |
+| CoreML FP16 | 2.901 | 0.947 | 18.19 | 4.2 MB |
+| **CoreML INT8 (this repo)** | **2.907** | **0.947** | **18.11** | **2.2 MB** |
+INT8 matches FP16 within run-to-run noise (ΔPESQ +0.006, ΔSI-SDR
+−0.07 dB, STOI identical) while cutting size by 48%.
 ## Latency (M2 Max)
 | Duration | Time | RTF |
 |----------|------|-----|
+| 5 s | 0.65 s | 0.13 |
+| 10 s | 1.2 s | 0.12 |
+| 20 s | 4.8 s | 0.24 |
+## Files
+| File | Size | Description |
+|------|------|-------------|
+| `DeepFilterNet3.mlmodelc` | 2.2 MB | Pre-compiled CoreML model (runs on Neural Engine) |
+| `auxiliary.npz` | 126 KB | ERB filterbank, Vorbis window, normalization states |
 ## Usage
+Add [speech-swift](https://github.com/soniqo/speech-swift) to `Package.swift`:
+```swift
+.package(url: "https://github.com/soniqo/speech-swift", branch: "main")
+```
+Then denoise:
 ```swift
 import SpeechEnhancement
 let clean = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
 ```
+CLI:
 ```bash
 swift run audio denoise noisy.wav --output clean.wav
 ```
+## Source
+- Base model: [Rikorose/DeepFilterNet3](https://github.com/Rikorose/DeepFilterNet) (Apache-2.0)
+## License
+- Model weights: Apache-2.0 / MIT dual license
+- CoreML conversion: Apache-2.0
+## Links
+- [speech-swift](https://github.com/soniqo/speech-swift) — Apple SDK
+- [soniqo.audio](https://soniqo.audio) — website
+- [MLX vs CoreML on Apple Silicon — a practical guide](https://blog.ivan.digital/mlx-vs-coreml-on-apple-silicon-a-practical-guide-to-picking-the-right-backend-and-why-you-should-f77ddea7b27a) — related blog post
+- [soniqo.audio/blog](https://soniqo.audio/blog) — blog
+## Reference
+- [DeepFilterNet3 paper](https://arxiv.org/abs/2305.08227)