LokaalHub/nl-wake-klein-base-clean

TL;DR. A ~50 KB Dutch wake-word base model for the phrase "Hé computer". Trained with microWakeWord (MixedNet architecture, 22K parameters, 40-bin TFLM microfrontend features). This is the shared base used by the LokaalHub on-device personalization pipeline.

Important: this base contains NO user-specific voice

This package is the generic starting point. To use it as a wake word that recognises your voice, run personalize.py with 3 of your own recordings — the personalised TFLite is yours and never leaves your device.

Reference integration: wake-on-edge → reson8 ASR

A complete reference integration pairing this wake-word with the reson8 Dutch streaming ASR engine lives at:

https://github.com/LokaalHub/wake-nl-reson8-demo

It includes a fully reproducible synthetic-voice demo (Piper TTS for enrollment + commands, no microphone needed) and a live-mic mode. Privacy posture: nothing leaves the device until the wake phrase fires.

Install

personalize.py (shipped with this repo) requires Python ≥ 3.10, TensorFlow, and microwakeword. Upstream's pip install is currently incomplete (missing __init__.py files in the audio/ and layers/ subpackages), so install from a pinned source clone:

# 1. Clone microwakeword at the SHA the base was trained against
git clone https://github.com/kahrendt/microWakeWord.git
cd microWakeWord
git checkout a70bd740d4e79ee8a8bb3db843fe862b88d5d6b0
# Patch the missing __init__.py files so pip install -e ships the subpackages
touch microwakeword/audio/__init__.py microwakeword/layers/__init__.py
pip install -e .
cd ..

# 2. Install personalize.py runtime deps
pip install tensorflow audiomentations librosa soundfile click rich pyyaml

On Apple Silicon use tensorflow-macos>=2.16,<2.17 plus tensorflow-metal>=1.1 in place of tensorflow.

Quickstart

# 1. Download the bundle (includes personalize.py)
huggingface-cli download LokaalHub/nl-wake-klein-base-clean --local-dir nl-wake-klein-base

# 2. Record three clean takes of "Hé computer" (~1.5 s each, 16 kHz mono WAV)
#    into a directory:
mkdir my_takes
arecord -f S16_LE -r 16000 -c 1 my_takes/take_1.wav   # repeat for take_2, take_3

# 3. Personalise (trains a private head; ~1-2 min on a CPU)
python nl-wake-klein-base/personalize.py \
    --base-package nl-wake-klein-base \
    --recordings   my_takes \
    --phrase       "Hé computer" \
    --output       my_wake.tflite

The result my_wake.tflite is a streaming int8 model usable on ESP32-S3 or any TFLite-Micro target. It also runs in desktop tflite-runtime for laptop / Pi deployments.

Architecture

Model: MixedNet (depthwise-separable mix-conv blocks) from microWakeWord.
Parameters: ~~22K (~~88 KB float, ~25 KB int8 TFLite).
Features: 40-bin TFLM audio_microfrontend (PCAN/AGC), 30 ms window, 10 ms hop, 16 kHz mono.
Input shape: [194, 40] (T frames × 40 mel bins).
Output: single sigmoid score in [0, 1].

Training data (Apache-2.0 clean)

Positives: Piper Dutch TTS voices alex/pim/ronnie (all CC0) saying "Hé computer".
Negatives:
- VoxPopuli NL — formal Dutch parliamentary speech (CC0 1.0).
- Piper-synthesised hard-negative phrases (phonetically similar Dutch).
Augmentation: None bundled — IRs / bg under uncertain or NC licenses excluded.

The base model contains no real user-recorded positives — only TTS.

Files in this bundle

Path	What it is
`personalize.py`	On-device fine-tuning script (Apache-2.0).
`base_keras.weights.h5`	Keras weights — entry point for `personalize.py`.
`base_streaming.tflite`	Pre-quantized int8 streaming TFLite (drop-in).
`base_metadata.yaml`	Architecture / feature config / personalize knobs.
`negatives.npz`	Pre-extracted negative spectrograms for fine-tuning.
`reference/irs/`	20 IR WAVs (16 kHz mono) for on-device augmentation.
`reference/bg/`	10 background-noise samples (3–8 s, 16 kHz mono).

License

Apache-2.0 — trained exclusively on permissively-licensed inputs (Piper voices CC0, VoxPopuli NL CC0 1.0, microWakeWord Apache-2.0). See notes/04_clean_retrain.md for the audit.

Evaluation

The base model is intentionally not tuned to any one user. The numbers below are for the base TFLite as-shipped; downstream users should run personalize.py against their own 3 enrollment recordings.

Metric	Value
Held-out user recall @ 0.5 (base)	0.000 (0/30, 95% CI [0.0%, 11.4%])
Held-out user recall @ 0.5 (after personalize)	0.367 (11/30, from 3 enrollment recordings)
FA/h on VoxPopuli-NL test (2.63 h, threshold 0.5)	0.000 (95% CI [0.000, 1.402])
Latency (M1, per 20 ms chunk)	~0.024 ms
TFLite size (int8 streaming)	55 KB

Citation

Built on microWakeWord by Kevin Ahrendt — Apache-2.0.

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support