Instructions to use HashNuke/tincan-wakewords with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use HashNuke/tincan-wakewords with NeMo:
# tag did not correspond to a valid NeMo domain.
- Notebooks
- Google Colab
- Kaggle
license: apache-2.0
library_name: nemo
tags:
- onnx
- nemo
- speech-commands
- wake-word-spotting
datasets:
- HashNuke/tincan-wakewords-data
metrics:
- accuracy
model-index:
- name: TinCan Speech Commands Model
results:
- task:
type: audio-classification
name: Speech command recognition
dataset:
name: TinCan Speech Commands validation set
type: tincan-speech-commands-validation
metrics:
- type: loss
name: Validation loss
value: 0.1493
- type: accuracy
name: Validation micro top-1 accuracy
value: 95.28
- type: accuracy
name: Validation macro accuracy
value: 94.61
TinCan Speech Commands Model
A compact English speech-command recognition model for tincan app.
This model recognizes 47 short command classes and is designed for small-footprint command recognition where cloud ASR is unnecessary or undesirable. The exported ONNX artifact is under 400 KB, making it practical for local-first applications, prototypes, and edge deployments.
- 12 custom words
- and 35 words from the Google Speech Commands dataset v2
Highlights
- 47-class English command recognizer
- ONNX export for portable inference
- Small model artifact:
model.onnxis approximately 378 KB - Based on NVIDIA NeMo's MatchboxNet command-recognition model family
Base Model
This model uses NVIDIA NeMo's commandrecognition_en_matchboxnet3x2x64_v2 MatchboxNet command-recognition architecture.
Base model reference: commandrecognition_en_matchboxnet3x2x64_v2
Metrics
These metrics describe the currently exported model.onnx artifact.
| Metric | Value |
|---|---|
| Validation loss | 0.1493 |
| Validation micro top-1 accuracy | 95.28% |
| Validation macro accuracy | 94.61% |
Supported Commands
Custom TinCan commands:
astra, bali, boston, capri, delhi, dublin, frisco, monaco, oslo, paris, seatown, tokyo
Google Speech Commands labels:
yes, no, up, down, left, right, on, off, stop, go, zero, one, two, three, four, five, six, seven, eight, nine, bed, bird, cat, dog, happy, house, marvin, sheila, tree, wow, backward, forward, follow, learn, visual
Inference Notes
The model outputs logits over the 47 labels listed in labels.json. Use the output index to look up the predicted command label.
Training Provenance
| Field | Value |
|---|---|
| Model name | commandrecognition_en_matchboxnet3x2x64_v2 |
| Export format | ONNX |
| Epochs | 10 |
| Batch size | 32 |
Limitations
- This is a closed-vocabulary command recognizer, not a general speech-to-text model.
- The model is intended for English short-command recognition.
- Validation metrics may not fully predict performance with every microphone, speaker, accent, room, or noise condition.