Wren β Crisp filler detector
Wren is the fast, lightweight model in Crisp's filler-detection family (named after birds β keen ears, clean song; the larger, higher-accuracy sibling is Kestrel). A wren is the tiniest bird yet punches far above its size β fitting for a ~75k-param CNN that hits 0.94 precision at ~600Γ real-time.
It detects filler words ("uh", "um") in speech for the Crisp macOS app β a fast, on-device alternative to running full speech-to-text just to find fillers.
- Input: one log-mel chunk,
[1, 1, 64, 25](250 ms of 16 kHz mono audio, standardized with fixed constantsMEL_MEAN=-18.5658,MEL_STD=17.9252). - Output:
filler_probβ P(this chunk is a filler), 0β¦1. - Format: a single-file Core ML
.mlmodel(download directly, no unzip). Runs on-device β Core ML schedules it across the Neural Engine, GPU, or CPU.
Feature extraction (mel + chunking + the slide/merge into time ranges) lives in
the host app; the model itself is a pure chunk β probability function.
Training
- Data: PodcastFillers β ~35k human-labeled
"uh"/"um" events, 350+ speakers. Trained on the
trainsplit. - Task: binary filler-vs-speech on 250 ms chunks (silence/pauses are handled
separately by the app via ffmpeg
silencedetect).
Evaluation (held-out, episode-disjoint test split)
| Threshold | Precision | Recall | F1 |
|---|---|---|---|
| 0.5 | 0.914 | 0.933 | 0.924 |
| 0.7 | 0.944 | 0.894 | 0.918 |
| 0.8 | 0.959 | 0.863 | 0.908 |
Speed: ~600Γ real-time on CPU. Recommended operating threshold 0.7 (Crisp favors precision β a false positive cuts real speech). Main error mode: occasional confusion of short real words; breaths/laughter/music are rarely misfired.
Limitations
English only. Trained on podcast audio; very different domains (heavy accents, noisy rooms) may need a higher threshold or a fine-tune. Not a transcriber β it only flags fillers.
Files & versions (open weights)
Everything is downloadable β pick whichever you need:
| File | What |
|---|---|
Wren.mlmodel |
Core ML model (Apple Neural Engine). A single file the app downloads directly. |
Wren.pt |
Raw PyTorch weights (state_dict) β for your own inference/fine-tuning. |
Wren.config.json |
Audio framing + normalization + input/output names + recommended threshold. |
Versioning: the version is the repo's commit count, v0.0.N (mirrors
Crisp's 0.<commits> scheme); each release is one commit + one tag. Pin an exact
version in the URL:
https://huggingface.co/rafay99-epic/crisp-models/resolve/v0.0.5/Wren.mlmodel
main always points at the latest.
License
Code: GPL-3.0 (Crisp). Model weights derive from PodcastFillers (CC). Credited to Syntax Lab Technology / Abdul Rafay.
- Downloads last month
- 29