clear / README.md
pveugen's picture
Move YAML frontmatter to line 1 so HF parses license + tags + pipeline_tag
8280158 verified
---
license: cc-by-nc-4.0
language:
- en
tags:
- audio
- speech-enhancement
- denoising
- dereverberation
- on-device
- core-ml
- onnx
pipeline_tag: audio-to-audio
---
<!--
Canonical source for the Hugging Face model card at
https://huggingface.co/detail-co/clear (the single combined card for both
clear-studio and clear-natural). Edit here, then push to the HF repo's
README.md. Kept in sync manually; the HF repo is not a git remote of this repo.
-->
# Clear β€” on-device speech enhancement
48 kHz on-device speech enhancement, trained on real Detail team
recordings and optimized for a range of microphones,
removing background noise and reverberation to leave the voice warm and
present, closer to a podcast studio than a phone call. Two premium-tier
variants ship from this repo.
## Try it
- **[Curated previews (iOS)](https://huggingface.co/spaces/detail-co/clear-demo)** β€” twelve real recordings from boats, hotel rooms, demo days, with before / after for each.
- **[Run in your browser](https://huggingface.co/spaces/detail-co/clear-demo-web)** β€” drop in your own file, get a clean one back. WebGPU where available, threaded WASM otherwise. Nothing leaves your device.
## Variants
| Variant | Character | When to use |
|---|---|---|
| **`clear-studio`** | Quiet, studio-like β€” silences near zero | Default. Works across the full range of input quality β€” phone audio, laptop mic, untreated rooms, USB / XLR podcast captures |
| **`clear-natural`** | Room tone, breath, lip texture preserved | Treated podcast studios, USB / XLR captures, voiceover where the original sound is intentional |
If your source is already clean and you want the model to stay
invisible, pick `clear-natural`. Otherwise, `clear-studio` is the
default.
## Files
Both variants ship in two formats. Same architecture, same realtime
cost β€” only the weights differ.
| Variant | File | Format | Download |
|---|---|---|---|
| `clear-studio` | `clear-studio.mlpackage.zip` | Core ML mlpackage (fp16) | ~3.8 MB |
| `clear-studio` | `clear-studio.mlmodelc.zip` | Core ML mlmodelc (fp16, precompiled) | ~3.8 MB |
| `clear-studio` | `clear-studio.onnx` | ONNX (fp32) | ~8.5 MB |
| `clear-natural` | `clear-natural.mlpackage.zip` | Core ML mlpackage (fp16) | ~3.8 MB |
| `clear-natural` | `clear-natural.mlmodelc.zip` | Core ML mlmodelc (fp16, precompiled) | ~3.8 MB |
| `clear-natural` | `clear-natural.onnx` | ONNX (fp32) | ~8.5 MB |
## Spec
- Architecture: DeepFilterNet 3 (DFN3-half)
- Sample rate: 48 kHz, mono or stereo (per-channel inference)
- Inference contract: `spec` / `feat_erb` / `feat_spec` β†’ `spec_enhanced`. STFT, ERB, and ISTFT are done host-side via vDSP (Swift) or pure Kotlin
## Performance
Both variants share the architecture and run at the same speed. Enhancing a
5-minute clip on the Apple Neural Engine:
| Device | Chip | Mono | Stereo |
|---|---|---:|---:|
| iPhone 15 Pro | A17 Pro | 4.88 s (61Γ— realtime) | 6.53 s (46Γ—) |
| iPhone 17 Pro | A19 Pro | 3.70 s (81Γ— realtime) | 5.16 s (58Γ—) |
Cold model load is ~0.6 s; later loads are ~100 ms via the system ANE cache.
## Used in
- **[Detail](https://detail.co)** β€” iOS and macOS video recording.
- **[Subwave](https://subwave.app)** β€” publish audio and video stories.
## Built on
- [DeepFilterNet 3](https://github.com/Rikorose/DeepFilterNet) by
Rikorose β€” MIT. Fine-tuned on Detail's speech corpus.
## License
[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Free
for research, evaluation, and personal use with attribution.
**Commercial use requires a separate license** β€” contact
`paul@detail.co`.
Made by Detail Technologies B.V.