license: apache-2.0
language:
- en
library_name: onnxruntime
pipeline_tag: audio-classification
tags:
- keyword-spotting
- wake-word
- edge-ai
- tinyml
- onnx
- microcontroller
- speech
- mlperf-tiny
- dscnn
datasets:
- speech_commands
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: constant-wake-0.5
results:
- task:
type: audio-classification
name: Keyword Spotting
dataset:
type: speech_commands
name: Google Speech Commands v0.02
split: test
metrics:
- type: accuracy
value: 99.83
- type: f1
value: 0.95
- type: precision
value: 0.978
- type: recall
value: 0.923
Constant Wake 0.5 β 180 KB Spoken Wake Word Detection
A 180 KB keyword spotting model that detects the wake word "marvin" with 99.83% accuracy and zero false positives in streaming evaluation. Built for microcontrollers.
| Metric | Value |
|---|---|
| Test Accuracy | 99.83% |
| Precision | 97.83% |
| Recall | 92.31% |
| F1 | 0.950 |
| Model Size | 180 KB (ONNX) |
| Parameters | 45,570 |
| Streaming FP | 0 (target: β€8) |
| Streaming FN | 1 (target: β€8) |
| MLPerf Tiny Target | β₯95% accuracy β exceeded by 4.83 points |
Architecture
1D Depthwise Separable CNN (DS-CNN) with energy-gated cascade:
Audio Input
β Energy Gate (silence rejection)
β FFT Feature Extraction
β 1D DS-CNN (64 channels)
β Classification (wake / not-wake)
- Stage 1: Energy-based silence gating (STE) β rejects silence frames before any CNN computation
- Stage 2: FFT feature extraction β MFCC-like spectral features
- Stage 3: 1D Depthwise Separable CNN β 64 channels, highly parameter-efficient
- Total: 45,570 parameters in 180 KB
The cascade architecture means the CNN only activates on non-silent frames, dramatically reducing power consumption on always-listening devices.
Benchmark Results
Classification (Static Test Set)
| Count | |
|---|---|
| True Positives | 180 |
| False Positives | 4 |
| True Negatives | 10,806 |
| False Negatives | 15 |
Streaming Evaluation (200s continuous audio)
| Metric | Result | Target | Status |
|---|---|---|---|
| False Positives | 0 | β€8 | PASS |
| False Negatives | 1 | β€8 | PASS |
| CNN Activations | 3 | β | Ultra-low power |
Only 3 CNN activations in 200 seconds of streaming β the energy gate rejects 98.5% of frames before reaching the CNN.
Quick Start
import onnxruntime as ort
import numpy as np
# Load model
session = ort.InferenceSession("sww_dscnn.onnx")
# Input: MFCC features, shape depends on your audio preprocessing
# Typical: [batch, time_steps, n_mfcc]
input_name = session.get_inputs()[0].name
input_shape = session.get_inputs()[0].shape
print(f"Expected input: {input_name}, shape: {input_shape}")
# Run inference
features = np.random.randn(*[1 if isinstance(d, str) else d for d in input_shape]).astype(np.float32)
output = session.run(None, {input_name: features})[0]
print(f"Output shape: {output.shape}")
Hardware Targets
| Platform | Expected Latency | Power |
|---|---|---|
| ARM Cortex-M4 (STM32L4) | <15ms | <1mW (with energy gate) |
| ARM Cortex-M7 (STM32H7) | <5ms | <2mW |
| ESP32-S3 | <10ms | <5mW |
| Raspberry Pi Pico | <20ms | <0.5mW |
The energy-gated cascade ensures the CNN runs only when speech energy is detected, enabling always-on listening at sub-milliwatt power budgets.
MLPerf Tiny Compliance
This model targets the Keyword Spotting (KWS) benchmark from MLPerf Tiny:
- Dataset: Google Speech Commands v0.02
- Task: Streaming keyword detection
- Target: β₯95% accuracy with β€8 FP and β€8 FN in streaming
- Result: 99.83% accuracy, 0 FP, 1 FN β all targets exceeded
Training Details
- Dataset: Google Speech Commands v0.02 (65,000+ 1-second audio clips)
- Wake word: "marvin"
- Architecture: Energy-Gated 1D DS-CNN, 64 channels
- Epochs: 30
- Hardware: NVIDIA RTX 4090
Use Cases
- Smart home devices β always-on wake word detection at <1mW
- Wearables β hearing aids, fitness bands, smartwatches
- Industrial IoT β voice-activated controls in noisy environments
- Automotive β in-cabin voice trigger without cloud connectivity
- Medical devices β hands-free activation for clinical tools
Citation
@misc{constantone2026wake,
title={Constant Wake: Energy-Gated Keyword Spotting for Microcontrollers},
author={ConstantOne AI},
year={2026},
url={https://huggingface.co/ConstantQJ/constant-wake-0.5}
}
License
Apache 2.0 β use freely in commercial and non-commercial projects.
Links
- ConstantOne AI
- Constant Edge 0.5 (Sentiment) β 1.46 MB sentiment analysis
- API Documentation