| --- |
| license: cc-by-nc-4.0 |
| language: |
| - en |
| tags: |
| - audio |
| - speech-enhancement |
| - denoising |
| - dereverberation |
| - on-device |
| - core-ml |
| - onnx |
| pipeline_tag: audio-to-audio |
| --- |
| |
| <!-- |
| Canonical source for the Hugging Face model card at |
| https://huggingface.co/detail-co/clear (the single combined card for both |
| clear-studio and clear-natural). Edit here, then push to the HF repo's |
| README.md. Kept in sync manually; the HF repo is not a git remote of this repo. |
| --> |
|
|
| # Clear β on-device speech enhancement |
|
|
| 48 kHz on-device speech enhancement, trained on real Detail team |
| recordings and optimized for a range of microphones, |
| removing background noise and reverberation to leave the voice warm and |
| present, closer to a podcast studio than a phone call. Two premium-tier |
| variants ship from this repo. |
|
|
| ## Try it |
|
|
| - **[Curated previews (iOS)](https://huggingface.co/spaces/detail-co/clear-demo)** β twelve real recordings from boats, hotel rooms, demo days, with before / after for each. |
| - **[Run in your browser](https://huggingface.co/spaces/detail-co/clear-demo-web)** β drop in your own file, get a clean one back. WebGPU where available, threaded WASM otherwise. Nothing leaves your device. |
|
|
| ## Variants |
|
|
| | Variant | Character | When to use | |
| |---|---|---| |
| | **`clear-studio`** | Quiet, studio-like β silences near zero | Default. Works across the full range of input quality β phone audio, laptop mic, untreated rooms, USB / XLR podcast captures | |
| | **`clear-natural`** | Room tone, breath, lip texture preserved | Treated podcast studios, USB / XLR captures, voiceover where the original sound is intentional | |
|
|
| If your source is already clean and you want the model to stay |
| invisible, pick `clear-natural`. Otherwise, `clear-studio` is the |
| default. |
|
|
| ## Files |
|
|
| Both variants ship in two formats. Same architecture, same realtime |
| cost β only the weights differ. |
|
|
| | Variant | File | Format | Download | |
| |---|---|---|---| |
| | `clear-studio` | `clear-studio.mlpackage.zip` | Core ML mlpackage (fp16) | ~3.8 MB | |
| | `clear-studio` | `clear-studio.mlmodelc.zip` | Core ML mlmodelc (fp16, precompiled) | ~3.8 MB | |
| | `clear-studio` | `clear-studio.onnx` | ONNX (fp32) | ~8.5 MB | |
| | `clear-natural` | `clear-natural.mlpackage.zip` | Core ML mlpackage (fp16) | ~3.8 MB | |
| | `clear-natural` | `clear-natural.mlmodelc.zip` | Core ML mlmodelc (fp16, precompiled) | ~3.8 MB | |
| | `clear-natural` | `clear-natural.onnx` | ONNX (fp32) | ~8.5 MB | |
|
|
| ## Spec |
|
|
| - Architecture: DeepFilterNet 3 (DFN3-half) |
| - Sample rate: 48 kHz, mono or stereo (per-channel inference) |
| - Inference contract: `spec` / `feat_erb` / `feat_spec` β `spec_enhanced`. STFT, ERB, and ISTFT are done host-side via vDSP (Swift) or pure Kotlin |
|
|
| ## Performance |
|
|
| Both variants share the architecture and run at the same speed. Enhancing a |
| 5-minute clip on the Apple Neural Engine: |
|
|
| | Device | Chip | Mono | Stereo | |
| |---|---|---:|---:| |
| | iPhone 15 Pro | A17 Pro | 4.88 s (61Γ realtime) | 6.53 s (46Γ) | |
| | iPhone 17 Pro | A19 Pro | 3.70 s (81Γ realtime) | 5.16 s (58Γ) | |
|
|
| Cold model load is ~0.6 s; later loads are ~100 ms via the system ANE cache. |
|
|
| ## Used in |
|
|
| - **[Detail](https://detail.co)** β iOS and macOS video recording. |
| - **[Subwave](https://subwave.app)** β publish audio and video stories. |
|
|
| ## Built on |
|
|
| - [DeepFilterNet 3](https://github.com/Rikorose/DeepFilterNet) by |
| Rikorose β MIT. Fine-tuned on Detail's speech corpus. |
|
|
| ## License |
|
|
| [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Free |
| for research, evaluation, and personal use with attribution. |
| **Commercial use requires a separate license** β contact |
| `paul@detail.co`. |
|
|
| Made by Detail Technologies B.V. |
|
|