Clear β€” on-device speech enhancement

48 kHz on-device speech enhancement. Takes noisy mono or stereo audio (phone mic, untreated room, traffic), returns a podcast-ready file: denoised, dereverbed, voice warm and present.

Try it

  • iOS / macOS: clear-swift β€” the Swift SDK with a built-in demo app.

Variants

Variant Character When to use
clear-studio Quiet, studio-like β€” silences near zero Default. Works across the full range of input quality β€” phone audio, laptop mic, untreated rooms, USB / XLR podcast captures.
clear-natural Room tone, breath, lip texture preserved Treated podcast studios, USB / XLR captures, voiceover where the original sound is intentional.

If the source is already clean and you want the model to stay invisible, pick clear-natural. Otherwise clear-studio is the default.

Files

Both variants share the same architecture and realtime cost β€” only the weights differ.

Variant File Format Size
clear-studio clear-studio.mlpackage.zip Core ML mlpackage (fp16) ~3.8 MB
clear-studio clear-studio.mlmodelc.zip Core ML mlmodelc (fp16, precompiled) ~3.8 MB
clear-studio clear-studio.onnx ONNX (fp32) ~8.5 MB
clear-natural clear-natural.mlpackage.zip Core ML mlpackage (fp16) ~3.8 MB
clear-natural clear-natural.mlmodelc.zip Core ML mlmodelc (fp16, precompiled) ~3.8 MB
clear-natural clear-natural.onnx ONNX (fp32) ~8.5 MB

Use

Swift (iOS / macOS)

import Clear

let clear = try await Clear()
try await clear.enhance(audioURL: inURL, outputURL: outURL)

See clear-swift for the full API and loudness presets (Apple Podcasts, Spotify, YouTube, EBU R128).

ONNX

from huggingface_hub import hf_hub_download
import onnxruntime as ort

path    = hf_hub_download("desert-ant-labs/clear", "clear-studio.onnx")
session = ort.InferenceSession(path, providers=["CPUExecutionProvider"])

Inputs and outputs

  • Architecture: DeepFilterNet 3 (DFN3-half).
  • Sample rate: 48 kHz, mono or stereo (per-channel inference).
  • Inference contract: spec / feat_erb / feat_spec β†’ spec_enhanced. STFT, ERB, and ISTFT are host-side via vDSP (Swift) or pure Kotlin.

Performance

Both variants run at the same speed. Enhancing a 5-minute clip on the Apple Neural Engine:

Device Chip Mono Stereo
iPhone 15 Pro A17 Pro 4.88 s (61Γ— realtime) 6.53 s (46Γ—)
iPhone 17 Pro A19 Pro 3.70 s (81Γ— realtime) 5.16 s (58Γ—)

Cold model load is ~0.6 s; later loads ~100 ms via the system ANE cache.

Limitations

  • Trained on English speech; non-English speech still benefits but has not been measured against per-language ground truth.
  • Heavy background music or multi-speaker overlap degrades quality.
  • Mastering is informational only; verify against the platform's actual loudness target before publishing.

Built on

  • DeepFilterNet 3 by Rikorose β€” MIT. Fine-tuned on the Desert Ant Labs speech corpus.

License

Released under the Desert Ant Labs Source-Available License v1.0 (see LICENSE.md).

  • Free for commercial use up to 100,000 Monthly Active Users (MAU).
  • Above 100,000 MAU a commercial license is required. Contact licensing@desertant.ai.

Citation

@software{clear_2026,
  title  = {Clear: on-device speech enhancement},
  author = {Desert Ant Labs},
  year   = {2026},
  url    = {https://huggingface.co/desert-ant-labs/clear},
}

Β© 2026 Desert Ant Labs Β· https://desertant.ai

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using desert-ant-labs/clear 1