aufklarer commited on
Commit
16cb41c
·
verified ·
1 Parent(s): 5cb9ebc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - speech-enhancement
5
+ - noise-reduction
6
+ - coreml
7
+ - apple-neural-engine
8
+ - deepfilternet
9
+ language:
10
+ - en
11
+ - multilingual
12
+ library_name: qwen3-asr-swift
13
+ pipeline_tag: audio-to-audio
14
+ ---
15
+
16
+ # DeepFilterNet3 - Core ML
17
+
18
+ Speech enhancement (noise removal) model converted to Core ML for Apple Neural Engine inference.
19
+
20
+ Based on [DeepFilterNet3](https://github.com/Rikorose/DeepFilterNet) (Interspeech 2023).
21
+
22
+ ## Model Details
23
+
24
+ | Property | Value |
25
+ |---|---|
26
+ | Parameters | 2.1M |
27
+ | Model size | 4.2 MB |
28
+ | Sample rate | 48 kHz |
29
+ | Latency | ~40ms (20ms frame + lookahead) |
30
+ | PESQ (DNS4) | 3.17 |
31
+ | Compute target | Apple Neural Engine |
32
+ | Framework | Core ML (mlprogram) |
33
+ | Min deployment | macOS 14+ / iOS 17+ |
34
+
35
+ ## Architecture
36
+
37
+ Signal processing (STFT, ERB filterbank, deep filtering) runs on CPU via Accelerate/vDSP.
38
+ Neural network inference runs on the Neural Engine via Core ML.
39
+
40
+ - Encoder: 4x SepConv2d + SqueezedGRU (256-dim, 3 layers)
41
+ - ERB Decoder: SqueezedGRU + skip convs + sigmoid mask (32 bands)
42
+ - DF Decoder: SqueezedGRU + deep filter coefficients (96 bins x 5 taps)
43
+
44
+ ## Usage with qwen3-asr-swift
45
+
46
+ ```swift
47
+ import SpeechEnhancement
48
+
49
+ let enhancer = try await SpeechEnhancer.fromPretrained()
50
+ let cleanAudio = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
51
+ ```
52
+
53
+ CLI:
54
+
55
+ ```bash
56
+ audio denoise input.wav --output clean.wav
57
+ ```
58
+
59
+ ## Performance
60
+
61
+ | Metric | Value |
62
+ |---|---|
63
+ | RTF (M2 Max) | 0.34 (3x real-time) |
64
+ | 20s audio | ~7s processing |
65
+
66
+ ## Files
67
+
68
+ - `DeepFilterNet3.mlpackage/` - Core ML model (Neural Engine)
69
+ - `auxiliary.npz` - Signal processing data (ERB filterbank, Vorbis window, normalization states)
70
+
71
+ ## Conversion
72
+
73
+ Converted from PyTorch checkpoint using `scripts/convert_deepfilternet3.py` in [qwen3-asr-swift](https://github.com/AufKlworworworworworkwork/qwen3-asr-swift).
74
+
75
+ ## License
76
+
77
+ MIT (following DeepFilterNet3 original license)
78
+
79
+ ## Citation
80
+
81
+ ```bibtex
82
+ @inproceedings{schroeter2023deepfilternet3,
83
+ title={DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement},
84
+ author={Schroeter, Hendrik and Maier, Andreas and Escalante-B, Alberto N and Rosenkranz, Tobias},
85
+ booktitle={Interspeech},
86
+ year={2023}
87
+ }
88
+ ```