File size: 4,762 Bytes
dffd7c4 fc73d82 dffd7c4 fc73d82 8abe262 fc73d82 8abe262 fc73d82 8abe262 fc73d82 8abe262 fc73d82 8abe262 fc73d82 8abe262 fc73d82 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | ---
license: mit
library_name: mlx
tags:
- mlx
- audio
- speech-enhancement
- noise-suppression
- deepfilternet
- apple-silicon
base_model:
- DeepFilterNet/DeepFilterNet
- DeepFilterNet/DeepFilterNet2
- DeepFilterNet/DeepFilterNet3
pipeline_tag: audio-to-audio
---
# DeepFilterNet — MLX
MLX-compatible weights for [DeepFilterNet](https://github.com/Rikorose/DeepFilterNet), a real-time speech enhancement framework that suppresses background noise from full-band 48 kHz audio.
This repository contains all three model versions (v1, v2, v3), converted directly from the original PyTorch checkpoints to `safetensors` format for use with [MLX](https://github.com/ml-explore/mlx) on Apple Silicon. No fine-tuning or quantization was applied — the weights are numerically identical to the originals.
## Models
Each version is stored in its own subfolder:
| Version | Subfolder | Weights | Paper |
|---------|-----------|---------|-------|
| DeepFilterNet v1 | `v1/` | ~7.2 MB (float32) | [arXiv:2110.05588](https://arxiv.org/abs/2110.05588) |
| DeepFilterNet v2 | `v2/` | ~8.9 MB (float32) | [arXiv:2205.05474](https://arxiv.org/abs/2205.05474) |
| DeepFilterNet v3 | `v3/` | ~8.3 MB (float32) | [arXiv:2305.08227](https://arxiv.org/abs/2305.08227) |
## Model Details
All versions share the same audio parameters:
| Parameter | Value |
|-----------|-------|
| Sample rate | 48 kHz |
| FFT size | 960 |
| Hop size | 480 |
| ERB bands | 32 |
| DF bins | 96 |
| DF order | 5 |
| Version | Embedding hidden dim |
|---------|---------------------|
| v1 | 512 |
| v2 | 256 |
| v3 | 256 |
## Files
```
convert_deepfilternet.py # PyTorch → MLX conversion script
v1/
config.json # v1 architecture configuration
model.safetensors # v1 weights
v2/
config.json # v2 architecture configuration
model.safetensors # v2 weights
v3/
config.json # v3 architecture configuration
model.safetensors # v3 weights
```
## Usage
### Python (mlx-audio)
```python
from mlx_audio.sts.models.deepfilternet import DeepFilterNetModel
# Load v3 (default)
model = DeepFilterNetModel.from_pretrained("mlx-community/DeepFilterNet-mlx")
# Load a specific version
model = DeepFilterNetModel.from_pretrained("mlx-community/DeepFilterNet-mlx", subfolder="v1")
# Enhance a file
enhanced = model.enhance("noisy.wav")
```
### Swift (mlx-audio-swift)
```swift
import MLXAudioSTS
let model = try await DeepFilterNetModel.fromPretrained("mlx-community/DeepFilterNet-mlx", subfolder: "v3")
let enhanced = try model.enhance(audioArray)
```
## Converting from PyTorch
To re-create these weights from the original DeepFilterNet checkpoints:
```bash
# Clone the original repo to get the pretrained checkpoints
git clone https://github.com/Rikorose/DeepFilterNet
# Convert each version
python convert_deepfilternet.py --input DeepFilterNet/DeepFilterNet --output v1 --name DeepFilterNet
python convert_deepfilternet.py --input DeepFilterNet/DeepFilterNet2 --output v2 --name DeepFilterNet2
python convert_deepfilternet.py --input DeepFilterNet/DeepFilterNet3 --output v3 --name DeepFilterNet3
```
Each input directory should contain a `config.ini` and a `checkpoints/` folder from the original repo.
Requires `torch` and `mlx` to be installed.
## Origin
- **Original model:** [DeepFilterNet](https://github.com/Rikorose/DeepFilterNet) by Hendrik Schroeter
- **License:** MIT (same as the original)
- **Conversion:** PyTorch → `safetensors` via `convert_deepfilternet.py`
## Citations
```bibtex
@inproceedings{schroeter2022deepfilternet,
title={{DeepFilterNet}: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering},
author={Schr{\"o}ter, Hendrik and Escalante-B., Alberto N. and Rosenkranz, Tobias and Maier, Andreas},
booktitle={ICASSP 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year={2022},
organization={IEEE}
}
@inproceedings{schroeter2022deepfilternet2,
title={{DeepFilterNet2}: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio},
author={Schr{\"o}ter, Hendrik and Escalante-B., Alberto N. and Rosenkranz, Tobias and Maier, Andreas},
booktitle={17th International Workshop on Acoustic Signal Enhancement (IWAENC 2022)},
year={2022},
}
@inproceedings{schroeter2023deepfilternet3,
title={DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement},
author={Schr{\"o}ter, Hendrik and Rosenkranz, Tobias and Escalante-B., Alberto N. and Maier, Andreas},
booktitle={INTERSPEECH},
year={2023}
}
```
|