danielr-ceva commited on
Commit
3985ab8
·
verified ·
1 Parent(s): 912784b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -82
README.md CHANGED
@@ -1,3 +1,4 @@
 
1
  ---
2
  license: apache-2.0
3
  pipeline_tag: audio-to-audio
@@ -5,125 +6,103 @@ tags:
5
  - speech_enhancement
6
  - noise_suppression
7
  - real_time
 
 
 
 
8
  - fullband
9
  ---
10
 
 
11
 
12
- # DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN
13
-
14
- DPDFNet is a family of causal, single-channel speech enhancement models for real-time noise suppression in challenging everyday environments. It extends the DeepFilterNet2 enhancement framework by inserting Dual-Path RNN (DPRNN) blocks into the encoder, strengthening long-range temporal and cross-band modeling while preserving a compact, streaming-friendly design.
15
-
16
- This repository provides TensorFlow Lite (TFLite) models optimized for mobile and edge deployment:
17
-
18
- **16kHz models**
19
- * `baseline.tflite`
20
- * `dpdfnet2.tflite`
21
- * `dpdfnet4.tflite`
22
- * `dpdfnet8.tflite`
23
 
24
- **48kHz model**
25
- * `dpdfnet2_48khz_hr.tflite`
 
 
 
 
26
 
27
  ---
28
 
29
- ## Key Features
30
 
31
- * Causal and low-latency: Designed for streaming use cases such as telephony, conferencing, and embedded devices.
32
- * Dual-Path RNN integration: Improves temporal context and frequency-domain interactions for more robust enhancement in difficult noise conditions.
33
- * Scalable family: Choose baseline or dpdfnet2/4/8 to balance quality vs. compute.
34
- * Edge deployment focus: Demonstrated on Ceva NeuPro Nano NPUs in the accompanying work.
35
- * Fullband option: A dedicated 48kHz model is provided for fullband enhancement.
36
 
37
  ---
38
 
39
- ## Model Variants and Footprint
40
-
41
- ### 16kHz models
42
-
43
- | Model | Params [M] | MACs [G] | TFLite Size [MB] |
44
- | --------- | :--------: | :------: | :--------------: |
45
- | Baseline | 2.31 | 0.36 | 8.5 |
46
- | DPDFNet-2 | 2.49 | 1.35 | 10.7 |
47
- | DPDFNet-4 | 2.84 | 2.36 | 12.9 |
48
- | DPDFNet-8 | 3.54 | 4.37 | 17.2 |
49
-
50
- ### 48kHz model
51
-
52
- | Model | Params [M] | MACs [G] | TFLite Size [MB] |
53
- | ------------ | :--------: | :------: | :--------------: |
54
- | DPDFNet-2 HR | 2.58 | 2.42 | 11.6 |
55
-
56
- ---
57
-
58
- ## Intended Use
59
-
60
- Primary task: Real-time, single-channel speech enhancement (noise suppression).
61
-
62
- Deployment targets: Mobile devices, embedded NPUs, and edge platforms.
63
 
64
- Input and Output:
65
 
66
- * **16kHz models**
67
- * Input: 16kHz mono noisy speech waveform
68
- * Output: 16kHz mono enhanced speech waveform
69
- * **48kHz model**
70
- * Input: 48kHz mono noisy speech waveform
71
- * Output: 48kHz mono enhanced speech waveform
72
 
73
- Typical applications:
74
 
75
- * Voice calls and VoIP
76
- * Video conferencing
77
- * Always-on voice interfaces
78
- * Wearables, earbuds, and embedded audio devices
79
 
80
  ---
81
 
82
- ## Inference
83
-
84
- This repo includes a inference script for running the TFLite models on WAV files using streaming-style, frame-by-frame inference: `run_tflite.py`.
85
-
86
- > **Note:** When using `dpdfnet2_48khz_hr`, the inference script automatically switches to the 48kHz processing pipeline.
87
-
88
- ### Setup
89
-
90
- Install dependencies:
91
 
92
  ```bash
93
- pip install numpy soundfile librosa tqdm
94
- pip install tflite-runtime
95
  ```
96
 
97
- ### Model placement
98
-
99
- By default, the script loads models from:
100
 
101
- * `./<model_name>.tflite`
 
 
102
 
103
- Create the folder and place the `.tflite` files there (or edit `TFLITE_DIR` in the script to match your layout).
 
104
 
105
- ### Run enhancement on a folder of WAVs
 
 
 
 
106
 
107
- The script processes `*.wav` files non-recursively and writes enhanced outputs as 16-bit PCM WAVs:
108
 
109
- ```bash
110
- python run_tflite.py --noisy_dir /path/to/noisy_wavs --enhanced_dir /path/to/out --model_name dpdfnet8
111
- ```
112
 
113
- Available `--model_name` options: `baseline`, `dpdfnet2`, `dpdfnet4`, `dpdfnet8`, `dpdfnet2_48khz_hr`.
 
 
 
114
 
115
- ---
 
 
116
 
117
- ## Training Data
 
 
118
 
119
- The models were trained using a mixture of public speech and noise datasets, including DNS4 (downsampled), MLS, MUSAN, and FSD50K.
 
 
 
120
 
121
  ---
122
 
123
  ## Citation
124
 
125
- If you use these models, please cite:
126
-
127
  ```bibtex
128
  @article{rika2025dpdfnet,
129
  title = {DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN},
 
1
+
2
  ---
3
  license: apache-2.0
4
  pipeline_tag: audio-to-audio
 
6
  - speech_enhancement
7
  - noise_suppression
8
  - real_time
9
+ - streaming
10
+ - causal
11
+ - onnx
12
+ - tflite
13
  - fullband
14
  ---
15
 
16
+ # DPDFNet
17
 
18
+ DPDFNet is a family of **causal, single‑channel** speech enhancement models for **real‑time noise suppression**.\
19
+ It builds on **DeepFilterNet2** by adding **Dual‑Path RNN (DPRNN)** blocks in the encoder for stronger long‑range modeling while staying streaming‑friendly.
 
 
 
 
 
 
 
 
 
20
 
21
+ **Links**
22
+ - Project page (audio samples + architecture): https://ceva-ip.github.io/DPDFNet/
23
+ - Paper (arXiv): https://arxiv.org/abs/2512.16420
24
+ - Code (GitHub): https://github.com/ceva-ip/DPDFNet
25
+ - Demo Space: https://huggingface.co/spaces/Ceva-IP/DPDFNetDemo
26
+ - Evaluation set: https://huggingface.co/datasets/Ceva-IP/DPDFNet_EvalSet
27
 
28
  ---
29
 
30
+ ## What’s in this repo
31
 
32
+ - **TFLite**: `*.tflite` (root)
33
+ - **ONNX**: `onnx/*.onnx`
34
+ - **PyTorch checkpoints**: `checkpoints/*.pth`
 
 
35
 
36
  ---
37
 
38
+ ## Model variants
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
+ ### 16 kHz models
41
 
42
+ | Model | DPRNN blocks | Params (M) | MACs (G) |
43
+ |---|:---:|:---:|:---:|
44
+ | `baseline` | 0 | 2.31 | 0.36 |
45
+ | `dpdfnet2` | 2 | 2.49 | 1.35 |
46
+ | `dpdfnet4` | 4 | 2.84 | 2.36 |
47
+ | `dpdfnet8` | 8 | 3.54 | 4.37 |
48
 
49
+ ### 48 kHz fullband model
50
 
51
+ | Model | DPRNN blocks | Params (M) | MACs (G) |
52
+ |---|:---:|:---:|:---:|
53
+ | `dpdfnet2_48khz_hr` | 2 | 2.58 | 2.42 |
 
54
 
55
  ---
56
 
57
+ ## Recommended inference (CPU-only, ONNX)
 
 
 
 
 
 
 
 
58
 
59
  ```bash
60
+ pip install dpdfnet
 
61
  ```
62
 
63
+ ### CLI
 
 
64
 
65
+ ```bash
66
+ # Enhance one file
67
+ dpdfnet enhance noisy.wav enhanced.wav --model dpdfnet4
68
 
69
+ # Enhance a directory
70
+ dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2
71
 
72
+ # Download models
73
+ dpdfnet download
74
+ dpdfnet download dpdfnet8
75
+ dpdfnet download dpdfnet4 --force
76
+ ```
77
 
78
+ ### Python API
79
 
80
+ ```python
81
+ import soundfile as sf
82
+ import dpdfnet
83
 
84
+ # In-memory enhancement:
85
+ audio, sr = sf.read("noisy.wav")
86
+ enhanced = dpdfnet.enhance(audio, sample_rate=sr, model="dpdfnet4")
87
+ sf.write("enhanced.wav", enhanced, sr)
88
 
89
+ # Enhance one file:
90
+ out_path = dpdfnet.enhance_file("noisy.wav", model="dpdfnet2")
91
+ print(out_path)
92
 
93
+ # Model listing:
94
+ for row in dpdfnet.available_models():
95
+ print(row["name"], row["ready"], row["cached"])
96
 
97
+ # Download models:
98
+ dpdfnet.download() # All models
99
+ dpdfnet.download("dpdfnet4") # Specific model
100
+ ```
101
 
102
  ---
103
 
104
  ## Citation
105
 
 
 
106
  ```bibtex
107
  @article{rika2025dpdfnet,
108
  title = {DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN},