aufklarer commited on
Commit
a9ec0b6
·
verified ·
1 Parent(s): 16cb41c

Update model card with FP16 precision and latency report

Browse files
Files changed (1) hide show
  1. README.md +22 -61
README.md CHANGED
@@ -1,88 +1,49 @@
1
  ---
2
- license: mit
3
  tags:
4
  - speech-enhancement
5
- - noise-reduction
6
  - coreml
7
- - apple-neural-engine
8
  - deepfilternet
9
- language:
10
- - en
11
- - multilingual
12
  library_name: qwen3-asr-swift
13
- pipeline_tag: audio-to-audio
14
  ---
15
 
16
- # DeepFilterNet3 - Core ML
17
 
18
- Speech enhancement (noise removal) model converted to Core ML for Apple Neural Engine inference.
19
 
20
- Based on [DeepFilterNet3](https://github.com/Rikorose/DeepFilterNet) (Interspeech 2023).
 
 
21
 
22
- ## Model Details
23
 
24
- | Property | Value |
25
- |---|---|
26
- | Parameters | 2.1M |
27
- | Model size | 4.2 MB |
28
- | Sample rate | 48 kHz |
29
- | Latency | ~40ms (20ms frame + lookahead) |
30
- | PESQ (DNS4) | 3.17 |
31
- | Compute target | Apple Neural Engine |
32
- | Framework | Core ML (mlprogram) |
33
- | Min deployment | macOS 14+ / iOS 17+ |
34
 
35
- ## Architecture
36
-
37
- Signal processing (STFT, ERB filterbank, deep filtering) runs on CPU via Accelerate/vDSP.
38
- Neural network inference runs on the Neural Engine via Core ML.
39
-
40
- - Encoder: 4x SepConv2d + SqueezedGRU (256-dim, 3 layers)
41
- - ERB Decoder: SqueezedGRU + skip convs + sigmoid mask (32 bands)
42
- - DF Decoder: SqueezedGRU + deep filter coefficients (96 bins x 5 taps)
43
-
44
- ## Usage with qwen3-asr-swift
45
 
46
  ```swift
47
  import SpeechEnhancement
48
 
49
  let enhancer = try await SpeechEnhancer.fromPretrained()
50
- let cleanAudio = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
51
  ```
52
 
53
- CLI:
54
-
55
  ```bash
56
- audio denoise input.wav --output clean.wav
57
  ```
58
 
59
- ## Performance
60
-
61
- | Metric | Value |
62
- |---|---|
63
- | RTF (M2 Max) | 0.34 (3x real-time) |
64
- | 20s audio | ~7s processing |
65
-
66
  ## Files
67
 
68
- - `DeepFilterNet3.mlpackage/` - Core ML model (Neural Engine)
69
- - `auxiliary.npz` - Signal processing data (ERB filterbank, Vorbis window, normalization states)
70
 
71
- ## Conversion
72
 
73
- Converted from PyTorch checkpoint using `scripts/convert_deepfilternet3.py` in [qwen3-asr-swift](https://github.com/AufKlworworworworworkwork/qwen3-asr-swift).
74
-
75
- ## License
76
-
77
- MIT (following DeepFilterNet3 original license)
78
-
79
- ## Citation
80
-
81
- ```bibtex
82
- @inproceedings{schroeter2023deepfilternet3,
83
- title={DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement},
84
- author={Schroeter, Hendrik and Maier, Andreas and Escalante-B, Alberto N and Rosenkranz, Tobias},
85
- booktitle={Interspeech},
86
- year={2023}
87
- }
88
- ```
 
1
  ---
2
+ license: apache-2.0
3
  tags:
4
  - speech-enhancement
5
+ - denoising
6
  - coreml
7
+ - apple-silicon
8
  - deepfilternet
 
 
 
9
  library_name: qwen3-asr-swift
 
10
  ---
11
 
12
+ # DeepFilterNet3 Core ML (FP16)
13
 
14
+ Real-time speech enhancement model for Apple Silicon. Removes background noise from speech audio.
15
 
16
+ - **2.1M params**, FP16, ~4.2 MB
17
+ - Runs on **Neural Engine** via Core ML
18
+ - 48kHz native, 10ms frames
19
 
20
+ ## Latency (M2 Max)
21
 
22
+ | Duration | Time | RTF |
23
+ |----------|------|-----|
24
+ | 5s | 0.65s | 0.13 |
25
+ | 10s | 1.2s | 0.12 |
26
+ | 20s | 4.8s | 0.24 |
 
 
 
 
 
27
 
28
+ ## Usage
 
 
 
 
 
 
 
 
 
29
 
30
  ```swift
31
  import SpeechEnhancement
32
 
33
  let enhancer = try await SpeechEnhancer.fromPretrained()
34
+ let clean = try enhancer.enhance(audio: noisyAudio, sampleRate: 48000)
35
  ```
36
 
 
 
37
  ```bash
38
+ swift run audio denoise noisy.wav --output clean.wav
39
  ```
40
 
 
 
 
 
 
 
 
41
  ## Files
42
 
43
+ - `DeepFilterNet3.mlpackage` Core ML FP16 model (Neural Engine)
44
+ - `auxiliary.npz` ERB filterbank, Vorbis window, normalization states
45
 
46
+ ## Reference
47
 
48
+ - [DeepFilterNet3](https://arxiv.org/abs/2305.08227) (Interspeech 2023)
49
+ - Part of [qwen3-asr-swift](https://github.com/ivan-digital/qwen3-asr-swift)