---
license: cc-by-nc-4.0
language:
  - en
tags:
  - audio
  - speech-enhancement
  - denoising
  - dereverberation
  - on-device
  - core-ml
  - onnx
pipeline_tag: audio-to-audio
---

<!--
Canonical source for the Hugging Face model card at
https://huggingface.co/detail-co/clear (the single combined card for both
clear-studio and clear-natural). Edit here, then push to the HF repo's
README.md. Kept in sync manually; the HF repo is not a git remote of this repo.
-->

# Clear — on-device speech enhancement

48 kHz on-device speech enhancement, trained on real Detail team
recordings and optimized for a range of microphones,
removing background noise and reverberation to leave the voice warm and
present, closer to a podcast studio than a phone call. Two premium-tier
variants ship from this repo.

## Try it

- **[Curated previews (iOS)](https://huggingface.co/spaces/detail-co/clear-demo)** — twelve real recordings from boats, hotel rooms, demo days, with before / after for each.
- **[Run in your browser](https://huggingface.co/spaces/detail-co/clear-demo-web)** — drop in your own file, get a clean one back. WebGPU where available, threaded WASM otherwise. Nothing leaves your device.

## Variants

| Variant | Character | When to use |
|---|---|---|
| **`clear-studio`** | Quiet, studio-like — silences near zero | Default. Works across the full range of input quality — phone audio, laptop mic, untreated rooms, USB / XLR podcast captures |
| **`clear-natural`** | Room tone, breath, lip texture preserved | Treated podcast studios, USB / XLR captures, voiceover where the original sound is intentional |

If your source is already clean and you want the model to stay
invisible, pick `clear-natural`. Otherwise, `clear-studio` is the
default.

## Files

Both variants ship in two formats. Same architecture, same realtime
cost — only the weights differ.

| Variant | File | Format | Download |
|---|---|---|---|
| `clear-studio` | `clear-studio.mlpackage.zip` | Core ML mlpackage (fp16) | ~3.8 MB |
| `clear-studio` | `clear-studio.mlmodelc.zip` | Core ML mlmodelc (fp16, precompiled) | ~3.8 MB |
| `clear-studio` | `clear-studio.onnx` | ONNX (fp32) | ~8.5 MB |
| `clear-natural` | `clear-natural.mlpackage.zip` | Core ML mlpackage (fp16) | ~3.8 MB |
| `clear-natural` | `clear-natural.mlmodelc.zip` | Core ML mlmodelc (fp16, precompiled) | ~3.8 MB |
| `clear-natural` | `clear-natural.onnx` | ONNX (fp32) | ~8.5 MB |

## Spec

- Architecture: DeepFilterNet 3 (DFN3-half)
- Sample rate: 48 kHz, mono or stereo (per-channel inference)
- Inference contract: `spec` / `feat_erb` / `feat_spec` → `spec_enhanced`. STFT, ERB, and ISTFT are done host-side via vDSP (Swift) or pure Kotlin

## Performance

Both variants share the architecture and run at the same speed. Enhancing a
5-minute clip on the Apple Neural Engine:

| Device | Chip | Mono | Stereo |
|---|---|---:|---:|
| iPhone 15 Pro | A17 Pro | 4.88 s (61× realtime) | 6.53 s (46×) |
| iPhone 17 Pro | A19 Pro | 3.70 s (81× realtime) | 5.16 s (58×) |

Cold model load is ~0.6 s; later loads are ~100 ms via the system ANE cache.

## Used in

- **[Detail](https://detail.co)** — iOS and macOS video recording.
- **[Subwave](https://subwave.app)** — publish audio and video stories.

## Built on

- [DeepFilterNet 3](https://github.com/Rikorose/DeepFilterNet) by
  Rikorose — MIT. Fine-tuned on Detail's speech corpus.

## License

[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Free
for research, evaluation, and personal use with attribution.
**Commercial use requires a separate license** — contact
`paul@detail.co`.

Made by Detail Technologies B.V.