clear / README.md

Move YAML frontmatter to line 1 so HF parses license + tags + pipeline_tag

8280158 verified 5 days ago

3.67 kB

	---
	license: cc-by-nc-4.0
	language:
	- en
	tags:
	- audio
	- speech-enhancement
	- denoising
	- dereverberation
	- on-device
	- core-ml
	- onnx
	pipeline_tag: audio-to-audio
	---

	<!--
	Canonical source for the Hugging Face model card at
	https://huggingface.co/detail-co/clear (the single combined card for both
	clear-studio and clear-natural). Edit here, then push to the HF repo's
	README.md. Kept in sync manually; the HF repo is not a git remote of this repo.
	-->

	# Clear — on-device speech enhancement

	48 kHz on-device speech enhancement, trained on real Detail team
	recordings and optimized for a range of microphones,
	removing background noise and reverberation to leave the voice warm and
	present, closer to a podcast studio than a phone call. Two premium-tier
	variants ship from this repo.

	## Try it

	- [Curated previews (iOS)](https://huggingface.co/spaces/detail-co/clear-demo) — twelve real recordings from boats, hotel rooms, demo days, with before / after for each.
	- [Run in your browser](https://huggingface.co/spaces/detail-co/clear-demo-web) — drop in your own file, get a clean one back. WebGPU where available, threaded WASM otherwise. Nothing leaves your device.

	## Variants

	\| Variant \| Character \| When to use \|
	\|---\|---\|---\|
	\| `clear-studio` \| Quiet, studio-like — silences near zero \| Default. Works across the full range of input quality — phone audio, laptop mic, untreated rooms, USB / XLR podcast captures \|
	\| `clear-natural` \| Room tone, breath, lip texture preserved \| Treated podcast studios, USB / XLR captures, voiceover where the original sound is intentional \|

	If your source is already clean and you want the model to stay
	invisible, pick `clear-natural`. Otherwise, `clear-studio` is the
	default.

	## Files

	Both variants ship in two formats. Same architecture, same realtime
	cost — only the weights differ.

	\| Variant \| File \| Format \| Download \|
	\|---\|---\|---\|---\|
	\| `clear-studio` \| `clear-studio.mlpackage.zip` \| Core ML mlpackage (fp16) \| ~3.8 MB \|
	\| `clear-studio` \| `clear-studio.mlmodelc.zip` \| Core ML mlmodelc (fp16, precompiled) \| ~3.8 MB \|
	\| `clear-studio` \| `clear-studio.onnx` \| ONNX (fp32) \| ~8.5 MB \|
	\| `clear-natural` \| `clear-natural.mlpackage.zip` \| Core ML mlpackage (fp16) \| ~3.8 MB \|
	\| `clear-natural` \| `clear-natural.mlmodelc.zip` \| Core ML mlmodelc (fp16, precompiled) \| ~3.8 MB \|
	\| `clear-natural` \| `clear-natural.onnx` \| ONNX (fp32) \| ~8.5 MB \|

	## Spec

	- Architecture: DeepFilterNet 3 (DFN3-half)
	- Sample rate: 48 kHz, mono or stereo (per-channel inference)
	- Inference contract: `spec` / `feat_erb` / `feat_spec` → `spec_enhanced`. STFT, ERB, and ISTFT are done host-side via vDSP (Swift) or pure Kotlin

	## Performance

	Both variants share the architecture and run at the same speed. Enhancing a
	5-minute clip on the Apple Neural Engine:

	\| Device \| Chip \| Mono \| Stereo \|
	\|---\|---\|---:\|---:\|
	\| iPhone 15 Pro \| A17 Pro \| 4.88 s (61× realtime) \| 6.53 s (46×) \|
	\| iPhone 17 Pro \| A19 Pro \| 3.70 s (81× realtime) \| 5.16 s (58×) \|

	Cold model load is ~0.6 s; later loads are ~100 ms via the system ANE cache.

	## Used in

	- [Detail](https://detail.co) — iOS and macOS video recording.
	- [Subwave](https://subwave.app) — publish audio and video stories.

	## Built on

	- [DeepFilterNet 3](https://github.com/Rikorose/DeepFilterNet) by
	Rikorose — MIT. Fine-tuned on Detail's speech corpus.

	## License

	[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). Free
	for research, evaluation, and personal use with attribution.
	Commercial use requires a separate license — contact
	`paul@detail.co`.

	Made by Detail Technologies B.V.