Kyle Howells

docs: align citation with official DeepFilterNet guide

2439328 4 days ago

2.64 kB

	---
	license: mit
	library_name: mlx
	tags:
	- mlx
	- audio
	- speech-enhancement
	- noise-suppression
	- deepfilternet
	- apple-silicon
	base_model: DeepFilterNet/DeepFilterNet3
	pipeline_tag: audio-to-audio
	---

	# DeepFilterNet3 — MLX

	MLX-compatible weights for [DeepFilterNet3](https://github.com/Rikorose/DeepFilterNet), a real-time speech enhancement model that suppresses background noise from audio.

	This is a direct conversion of the original PyTorch weights to `safetensors` format for use with [MLX](https://github.com/ml-explore/mlx) on Apple Silicon.

	## Origin

	- Original model: [DeepFilterNet3](https://github.com/Rikorose/DeepFilterNet) by Hendrik Schröter
	- Paper: [DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement](https://arxiv.org/abs/2305.08227)
	- License: MIT (same as the original)
	- Conversion: PyTorch → `safetensors` via the included `convert_deepfilternet.py` script

	No fine-tuning or quantisation was applied — the weights are numerically identical to the original checkpoint.

	## Files

	\| File \| Description \|
	\|---\|---\|
	\| `config.json` \| Model architecture configuration \|
	\| `model.safetensors` \| Pre-converted weights (8.3 MB, float32) \|
	\| `convert_deepfilternet.py` \| Conversion script (PyTorch → MLX safetensors) \|

	## Model Details

	\| Parameter \| Value \|
	\|---\|---\|
	\| Sample rate \| 48 kHz \|
	\| FFT size \| 960 \|
	\| Hop size \| 480 \|
	\| ERB bands \| 32 \|
	\| DF bins \| 96 \|
	\| DF order \| 5 \|
	\| Parameters \| ~2M \|

	## Usage

	### Swift (mlx-audio-swift)

	```swift
	import MLXAudioSTS

	let model = try await DeepFilterNetModel.fromPretrained("iky1e/DeepFilterNet3-MLX")
	let enhanced = try model.enhance(audioArray)
	```

	### Python (mlx-audio)

	```python
	from mlx_audio.sts.models.deepfilternet import DeepFilterNetModel

	model = DeepFilterNetModel.from_pretrained(version=3, model_dir="path/to/local/dir")
	enhanced = model.enhance("noisy.wav")
	```

	## Converting from PyTorch

	To re-create this conversion from the original DeepFilterNet checkpoint:

	```bash
	python convert_deepfilternet.py \
	--input /path/to/DeepFilterNet3 \
	--output ./DeepFilterNet3-MLX \
	--name DeepFilterNet3
	```

	The input directory should contain a `config.ini` and a `checkpoints/` folder from the original repo.

	## Citation

	```bibtex
	@inproceedings{schroeter2023deepfilternet3,
	title={DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement},
	author={Schr{\"o}ter, Hendrik and Rosenkranz, Tobias and Escalante-B., Alberto N. and Maier, Andreas},
	booktitle={INTERSPEECH},
	year={2023}
	}
	```