File size: 2,643 Bytes
3fd7249
 
ee295a4
 
 
 
 
 
 
 
 
 
3fd7249
ee295a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2439328
 
 
 
 
 
ee295a4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---

license: mit
library_name: mlx
tags:
  - mlx
  - audio
  - speech-enhancement
  - noise-suppression
  - deepfilternet
  - apple-silicon
base_model: DeepFilterNet/DeepFilterNet3
pipeline_tag: audio-to-audio
---


# DeepFilterNet3 — MLX

MLX-compatible weights for [DeepFilterNet3](https://github.com/Rikorose/DeepFilterNet), a real-time speech enhancement model that suppresses background noise from audio.

This is a direct conversion of the original PyTorch weights to `safetensors` format for use with [MLX](https://github.com/ml-explore/mlx) on Apple Silicon.

## Origin

- **Original model:** [DeepFilterNet3](https://github.com/Rikorose/DeepFilterNet) by Hendrik Schröter
- **Paper:** [DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement](https://arxiv.org/abs/2305.08227)
- **License:** MIT (same as the original)
- **Conversion:** PyTorch → `safetensors` via the included `convert_deepfilternet.py` script

No fine-tuning or quantisation was applied — the weights are numerically identical to the original checkpoint.

## Files

| File | Description |
|---|---|
| `config.json` | Model architecture configuration |
| `model.safetensors` | Pre-converted weights (8.3 MB, float32) |
| `convert_deepfilternet.py` | Conversion script (PyTorch → MLX safetensors) |

## Model Details

| Parameter | Value |
|---|---|
| Sample rate | 48 kHz |
| FFT size | 960 |
| Hop size | 480 |
| ERB bands | 32 |
| DF bins | 96 |
| DF order | 5 |
| Parameters | ~2M |

## Usage

### Swift (mlx-audio-swift)

```swift

import MLXAudioSTS



let model = try await DeepFilterNetModel.fromPretrained("iky1e/DeepFilterNet3-MLX")

let enhanced = try model.enhance(audioArray)

```

### Python (mlx-audio)

```python

from mlx_audio.sts.models.deepfilternet import DeepFilterNetModel



model = DeepFilterNetModel.from_pretrained(version=3, model_dir="path/to/local/dir")

enhanced = model.enhance("noisy.wav")

```

## Converting from PyTorch

To re-create this conversion from the original DeepFilterNet checkpoint:

```bash

python convert_deepfilternet.py \

  --input /path/to/DeepFilterNet3 \

  --output ./DeepFilterNet3-MLX \

  --name DeepFilterNet3

```

The input directory should contain a `config.ini` and a `checkpoints/` folder from the original repo.

## Citation

```bibtex

@inproceedings{schroeter2023deepfilternet3,
  title={DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement},
  author={Schr{\"o}ter, Hendrik and Rosenkranz, Tobias and Escalante-B., Alberto N. and Maier, Andreas},
  booktitle={INTERSPEECH},
  year={2023}
}
```