--- license: cc-by-nc-4.0 language: - en tags: - audio - speech-enhancement - denoising - dereverberation - on-device - core-ml - onnx pipeline_tag: audio-to-audio --- # Clear — on-device speech enhancement 48 kHz on-device speech enhancement, trained on real Detail team recordings and optimized for a range of microphones, removing background noise and reverberation to leave the voice warm and present, closer to a podcast studio than a phone call. Two premium-tier variants ship from this repo. ## Try it - **[Curated previews (iOS)](https://huggingface.co/spaces/detail-co/clear-demo)** — twelve real recordings from boats, hotel rooms, demo days, with before / after for each. - **[Run in your browser](https://huggingface.co/spaces/detail-co/clear-demo-web)** — drop in your own file, get a clean one back. WebGPU where available, threaded WASM otherwise. Nothing leaves your device. ## Variants | Variant | Character | When to use | |---|---|---| | **`clear-studio`** | Quiet, studio-like — silences near zero | Default. Works across the full range of input quality — phone audio, laptop mic, untreated rooms, USB / XLR podcast captures | | **`clear-natural`** | Room tone, breath, lip texture preserved | Treated podcast studios, USB / XLR captures, voiceover where the original sound is intentional | If your source is already clean and you want the model to stay invisible, pick `clear-natural`. Otherwise, `clear-studio` is the default. ## Files Both variants ship in two formats. Same architecture, same realtime cost — only the weights differ. | Variant | File | Format | Download | |---|---|---|---| | `clear-studio` | `clear-studio.mlpackage.zip` | Core ML mlpackage (fp16) | ~3.8 MB | | `clear-studio` | `clear-studio.mlmodelc.zip` | Core ML mlmodelc (fp16, precompiled) | ~3.8 MB | | `clear-studio` | `clear-studio.onnx` | ONNX (fp32) | ~8.5 MB | | `clear-natural` | `clear-natural.mlpackage.zip` | Core ML mlpackage (fp16) | ~3.8 MB | | `clear-natural` | `clear-natural.mlmodelc.zip` | Core ML mlmodelc (fp16, precompiled) | ~3.8 MB | | `clear-natural` | `clear-natural.onnx` | ONNX (fp32) | ~8.5 MB | ## Spec - Architecture: DeepFilterNet 3 (DFN3-half) - Sample rate: 48 kHz, mono or stereo (per-channel inference) - Inference contract: `spec` / `feat_erb` / `feat_spec` → `spec_enhanced`. STFT, ERB, and ISTFT are done host-side via vDSP (Swift) or pure Kotlin ## Performance Both variants share the architecture and run at the same speed. Enhancing a 5-minute clip on the Apple Neural Engine: | Device | Chip | Mono | Stereo | |---|---|---:|---:| | iPhone 15 Pro | A17 Pro | 4.88 s (61× realtime) | 6.53 s (46×) | | iPhone 17 Pro | A19 Pro | 3.70 s (81× realtime) | 5.16 s (58×) | Cold model load is ~0.6 s; later loads are ~100 ms via the system ANE cache. ## Used in - **[Detail](https://detail.co)** — iOS and macOS video recording. - **[Subwave](https://subwave.app)** — publish audio and video stories. ## Built on - [DeepFilterNet 3](https://github.com/Rikorose/DeepFilterNet) by Rikorose — MIT. Fine-tuned on Detail's speech corpus. ## License [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Free for research, evaluation, and personal use with attribution. **Commercial use requires a separate license** — contact `paul@detail.co`. Made by Detail Technologies B.V.