| | ---
|
| | license: mit
|
| | library_name: mlx
|
| | tags:
|
| | - mlx
|
| | - audio
|
| | - speech-enhancement
|
| | - noise-suppression
|
| | - deepfilternet
|
| | - apple-silicon
|
| | base_model: DeepFilterNet/DeepFilterNet3
|
| | pipeline_tag: audio-to-audio
|
| | ---
|
| |
|
| | # DeepFilterNet3 — MLX
|
| |
|
| | MLX-compatible weights for [DeepFilterNet3](https://github.com/Rikorose/DeepFilterNet), a real-time speech enhancement model that suppresses background noise from audio.
|
| |
|
| | This is a direct conversion of the original PyTorch weights to `safetensors` format for use with [MLX](https://github.com/ml-explore/mlx) on Apple Silicon.
|
| |
|
| | ## Origin
|
| |
|
| | - **Original model:** [DeepFilterNet3](https://github.com/Rikorose/DeepFilterNet) by Hendrik Schröter
|
| | - **Paper:** [DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement](https://arxiv.org/abs/2305.08227)
|
| | - **License:** MIT (same as the original)
|
| | - **Conversion:** PyTorch → `safetensors` via the included `convert_deepfilternet.py` script
|
| |
|
| | No fine-tuning or quantisation was applied — the weights are numerically identical to the original checkpoint.
|
| |
|
| | ## Files
|
| |
|
| | | File | Description |
|
| | |---|---|
|
| | | `config.json` | Model architecture configuration |
|
| | | `model.safetensors` | Pre-converted weights (8.3 MB, float32) |
|
| | | `convert_deepfilternet.py` | Conversion script (PyTorch → MLX safetensors) |
|
| |
|
| | ## Model Details
|
| |
|
| | | Parameter | Value |
|
| | |---|---|
|
| | | Sample rate | 48 kHz |
|
| | | FFT size | 960 |
|
| | | Hop size | 480 |
|
| | | ERB bands | 32 |
|
| | | DF bins | 96 |
|
| | | DF order | 5 |
|
| | | Parameters | ~2M |
|
| |
|
| | ## Usage
|
| |
|
| | ### Swift (mlx-audio-swift)
|
| |
|
| | ```swift
|
| | import MLXAudioSTS
|
| |
|
| | let model = try await DeepFilterNetModel.fromPretrained("iky1e/DeepFilterNet3-MLX")
|
| | let enhanced = try model.enhance(audioArray)
|
| | ```
|
| |
|
| | ### Python (mlx-audio)
|
| |
|
| | ```python
|
| | from mlx_audio.sts.models.deepfilternet import DeepFilterNetModel
|
| |
|
| | model = DeepFilterNetModel.from_pretrained(version=3, model_dir="path/to/local/dir")
|
| | enhanced = model.enhance("noisy.wav")
|
| | ```
|
| |
|
| | ## Converting from PyTorch
|
| |
|
| | To re-create this conversion from the original DeepFilterNet checkpoint:
|
| |
|
| | ```bash
|
| | python convert_deepfilternet.py \
|
| | --input /path/to/DeepFilterNet3 \
|
| | --output ./DeepFilterNet3-MLX \
|
| | --name DeepFilterNet3
|
| | ```
|
| |
|
| | The input directory should contain a `config.ini` and a `checkpoints/` folder from the original repo.
|
| |
|
| | ## Citation
|
| |
|
| | ```bibtex
|
| | @inproceedings{schroeter2023deepfilternet3, |
| | title={DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement}, |
| | author={Schr{\"o}ter, Hendrik and Rosenkranz, Tobias and Escalante-B., Alberto N. and Maier, Andreas}, |
| | booktitle={INTERSPEECH}, |
| | year={2023} |
| | } |
| | ```
|
| |
|