YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
LokaalHub/nl-wake-klein-base-clean
TL;DR. A ~50 KB Dutch wake-word base model for the phrase "HΓ© computer". Trained with microWakeWord (MixedNet architecture, 22K parameters, 40-bin TFLM microfrontend features). This is the shared base used by the LokaalHub on-device personalization pipeline.
Important: this base contains NO user-specific voice
This package is the generic starting point. To use it as a wake word
that recognises your voice, run personalize.py with 3 of your own
recordings β the personalised TFLite is yours and never leaves your
device.
Reference integration: wake-on-edge β reson8 ASR
A complete reference integration pairing this wake-word with the reson8 Dutch streaming ASR engine lives at:
It includes a fully reproducible synthetic-voice demo (Piper TTS for enrollment + commands, no microphone needed) and a live-mic mode. Privacy posture: nothing leaves the device until the wake phrase fires.
Install
personalize.py (shipped with this repo) requires Python β₯ 3.10, TensorFlow,
and microwakeword. Upstream's pip install is currently incomplete (missing
__init__.py files in the audio/ and layers/ subpackages), so install
from a pinned source clone:
# 1. Clone microwakeword at the SHA the base was trained against
git clone https://github.com/kahrendt/microWakeWord.git
cd microWakeWord
git checkout a70bd740d4e79ee8a8bb3db843fe862b88d5d6b0
# Patch the missing __init__.py files so pip install -e ships the subpackages
touch microwakeword/audio/__init__.py microwakeword/layers/__init__.py
pip install -e .
cd ..
# 2. Install personalize.py runtime deps
pip install tensorflow audiomentations librosa soundfile click rich pyyaml
On Apple Silicon use tensorflow-macos>=2.16,<2.17 plus tensorflow-metal>=1.1
in place of tensorflow.
Quickstart
# 1. Download the bundle (includes personalize.py)
huggingface-cli download LokaalHub/nl-wake-klein-base-clean --local-dir nl-wake-klein-base
# 2. Record three clean takes of "HΓ© computer" (~1.5 s each, 16 kHz mono WAV)
# into a directory:
mkdir my_takes
arecord -f S16_LE -r 16000 -c 1 my_takes/take_1.wav # repeat for take_2, take_3
# 3. Personalise (trains a private head; ~1-2 min on a CPU)
python nl-wake-klein-base/personalize.py \
--base-package nl-wake-klein-base \
--recordings my_takes \
--phrase "HΓ© computer" \
--output my_wake.tflite
The result my_wake.tflite is a streaming int8 model usable on ESP32-S3
or any TFLite-Micro target. It also runs in desktop tflite-runtime for
laptop / Pi deployments.
Architecture
- Model: MixedNet (depthwise-separable mix-conv blocks) from microWakeWord.
- Parameters:
22K (88 KB float, ~25 KB int8 TFLite). - Features: 40-bin TFLM
audio_microfrontend(PCAN/AGC), 30 ms window, 10 ms hop, 16 kHz mono. - Input shape: [194, 40] (T frames Γ 40 mel bins).
- Output: single sigmoid score in [0, 1].
Training data (Apache-2.0 clean)
- Positives: Piper Dutch TTS voices alex/pim/ronnie (all CC0) saying "HΓ© computer".
- Negatives:
- VoxPopuli NL β formal Dutch parliamentary speech (CC0 1.0).
- Piper-synthesised hard-negative phrases (phonetically similar Dutch).
- Augmentation: None bundled β IRs / bg under uncertain or NC licenses excluded.
The base model contains no real user-recorded positives β only TTS.
Files in this bundle
| Path | What it is |
|---|---|
personalize.py |
On-device fine-tuning script (Apache-2.0). |
base_keras.weights.h5 |
Keras weights β entry point for personalize.py. |
base_streaming.tflite |
Pre-quantized int8 streaming TFLite (drop-in). |
base_metadata.yaml |
Architecture / feature config / personalize knobs. |
negatives.npz |
Pre-extracted negative spectrograms for fine-tuning. |
reference/irs/ |
20 IR WAVs (16 kHz mono) for on-device augmentation. |
reference/bg/ |
10 background-noise samples (3β8 s, 16 kHz mono). |
License
Apache-2.0 β trained exclusively on permissively-licensed inputs (Piper voices CC0, VoxPopuli NL CC0 1.0, microWakeWord Apache-2.0). See notes/04_clean_retrain.md for the audit.
Evaluation
The base model is intentionally not tuned to any one user. The
numbers below are for the base TFLite as-shipped; downstream users
should run personalize.py against their own 3 enrollment recordings.
| Metric | Value |
|---|---|
| Held-out user recall @ 0.5 (base) | 0.000 (0/30, 95% CI [0.0%, 11.4%]) |
| Held-out user recall @ 0.5 (after personalize) | 0.367 (11/30, from 3 enrollment recordings) |
| FA/h on VoxPopuli-NL test (2.63 h, threshold 0.5) | 0.000 (95% CI [0.000, 1.402]) |
| Latency (M1, per 20 ms chunk) | ~0.024 ms |
| TFLite size (int8 streaming) | 55 KB |
Citation
Built on microWakeWord by Kevin Ahrendt β Apache-2.0.
- Downloads last month
- 20