YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

LokaalHub/nl-wake-klein-base-clean

TL;DR. A ~50 KB Dutch wake-word base model for the phrase "HΓ© computer". Trained with microWakeWord (MixedNet architecture, 22K parameters, 40-bin TFLM microfrontend features). This is the shared base used by the LokaalHub on-device personalization pipeline.

Important: this base contains NO user-specific voice

This package is the generic starting point. To use it as a wake word that recognises your voice, run personalize.py with 3 of your own recordings β€” the personalised TFLite is yours and never leaves your device.

Reference integration: wake-on-edge β†’ reson8 ASR

A complete reference integration pairing this wake-word with the reson8 Dutch streaming ASR engine lives at:

https://github.com/LokaalHub/wake-nl-reson8-demo

It includes a fully reproducible synthetic-voice demo (Piper TTS for enrollment + commands, no microphone needed) and a live-mic mode. Privacy posture: nothing leaves the device until the wake phrase fires.

Install

personalize.py (shipped with this repo) requires Python β‰₯ 3.10, TensorFlow, and microwakeword. Upstream's pip install is currently incomplete (missing __init__.py files in the audio/ and layers/ subpackages), so install from a pinned source clone:

# 1. Clone microwakeword at the SHA the base was trained against
git clone https://github.com/kahrendt/microWakeWord.git
cd microWakeWord
git checkout a70bd740d4e79ee8a8bb3db843fe862b88d5d6b0
# Patch the missing __init__.py files so pip install -e ships the subpackages
touch microwakeword/audio/__init__.py microwakeword/layers/__init__.py
pip install -e .
cd ..

# 2. Install personalize.py runtime deps
pip install tensorflow audiomentations librosa soundfile click rich pyyaml

On Apple Silicon use tensorflow-macos>=2.16,<2.17 plus tensorflow-metal>=1.1 in place of tensorflow.

Quickstart

# 1. Download the bundle (includes personalize.py)
huggingface-cli download LokaalHub/nl-wake-klein-base-clean --local-dir nl-wake-klein-base

# 2. Record three clean takes of "HΓ© computer" (~1.5 s each, 16 kHz mono WAV)
#    into a directory:
mkdir my_takes
arecord -f S16_LE -r 16000 -c 1 my_takes/take_1.wav   # repeat for take_2, take_3

# 3. Personalise (trains a private head; ~1-2 min on a CPU)
python nl-wake-klein-base/personalize.py \
    --base-package nl-wake-klein-base \
    --recordings   my_takes \
    --phrase       "HΓ© computer" \
    --output       my_wake.tflite

The result my_wake.tflite is a streaming int8 model usable on ESP32-S3 or any TFLite-Micro target. It also runs in desktop tflite-runtime for laptop / Pi deployments.

Architecture

  • Model: MixedNet (depthwise-separable mix-conv blocks) from microWakeWord.
  • Parameters: 22K (88 KB float, ~25 KB int8 TFLite).
  • Features: 40-bin TFLM audio_microfrontend (PCAN/AGC), 30 ms window, 10 ms hop, 16 kHz mono.
  • Input shape: [194, 40] (T frames Γ— 40 mel bins).
  • Output: single sigmoid score in [0, 1].

Training data (Apache-2.0 clean)

  • Positives: Piper Dutch TTS voices alex/pim/ronnie (all CC0) saying "HΓ© computer".
  • Negatives:
    • VoxPopuli NL β€” formal Dutch parliamentary speech (CC0 1.0).
    • Piper-synthesised hard-negative phrases (phonetically similar Dutch).
  • Augmentation: None bundled β€” IRs / bg under uncertain or NC licenses excluded.

The base model contains no real user-recorded positives β€” only TTS.

Files in this bundle

Path What it is
personalize.py On-device fine-tuning script (Apache-2.0).
base_keras.weights.h5 Keras weights β€” entry point for personalize.py.
base_streaming.tflite Pre-quantized int8 streaming TFLite (drop-in).
base_metadata.yaml Architecture / feature config / personalize knobs.
negatives.npz Pre-extracted negative spectrograms for fine-tuning.
reference/irs/ 20 IR WAVs (16 kHz mono) for on-device augmentation.
reference/bg/ 10 background-noise samples (3–8 s, 16 kHz mono).

License

Apache-2.0 β€” trained exclusively on permissively-licensed inputs (Piper voices CC0, VoxPopuli NL CC0 1.0, microWakeWord Apache-2.0). See notes/04_clean_retrain.md for the audit.

Evaluation

The base model is intentionally not tuned to any one user. The numbers below are for the base TFLite as-shipped; downstream users should run personalize.py against their own 3 enrollment recordings.

Metric Value
Held-out user recall @ 0.5 (base) 0.000 (0/30, 95% CI [0.0%, 11.4%])
Held-out user recall @ 0.5 (after personalize) 0.367 (11/30, from 3 enrollment recordings)
FA/h on VoxPopuli-NL test (2.63 h, threshold 0.5) 0.000 (95% CI [0.000, 1.402])
Latency (M1, per 20 ms chunk) ~0.024 ms
TFLite size (int8 streaming) 55 KB

Citation

Built on microWakeWord by Kevin Ahrendt β€” Apache-2.0.

Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support