tincan-wakewords / README.md
Akash Manohar
update readme frontmatter
a938cff
---
license: apache-2.0
library_name: nemo
tags:
- onnx
- nemo
- speech-commands
- wake-word-spotting
datasets:
- HashNuke/tincan-wakewords-data
metrics:
- accuracy
model-index:
- name: TinCan Speech Commands Model
results:
- task:
type: audio-classification
name: Speech command recognition
dataset:
name: TinCan Speech Commands validation set
type: tincan-speech-commands-validation
metrics:
- type: loss
name: Validation loss
value: 0.1493
- type: accuracy
name: Validation micro top-1 accuracy
value: 95.28
- type: accuracy
name: Validation macro accuracy
value: 94.61
---
# TinCan Speech Commands Model
A compact English speech-command recognition model for tincan app.
This model recognizes 47 short command classes and is designed for small-footprint command recognition where cloud ASR is unnecessary or undesirable. The exported ONNX artifact is under 400 KB, making it practical for local-first applications, prototypes, and edge deployments.
* 12 custom words
* and 35 words from the Google Speech Commands dataset v2
## Highlights
- 47-class English command recognizer
- ONNX export for portable inference
- Small model artifact: `model.onnx` is approximately 378 KB
- Based on NVIDIA NeMo's MatchboxNet command-recognition model family
## Base Model
This model uses NVIDIA NeMo's `commandrecognition_en_matchboxnet3x2x64_v2` MatchboxNet command-recognition architecture.
Base model reference: [`commandrecognition_en_matchboxnet3x2x64_v2`](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/commandrecognition_en_matchboxnet3x2x64_v2)
## Metrics
These metrics describe the currently exported `model.onnx` artifact.
| Metric | Value |
|---|---:|
| Validation loss | 0.1493 |
| Validation micro top-1 accuracy | 95.28% |
| Validation macro accuracy | 94.61% |
## Supported Commands
Custom TinCan commands:
`astra`, `bali`, `boston`, `capri`, `delhi`, `dublin`, `frisco`, `monaco`, `oslo`, `paris`, `seatown`, `tokyo`
Google Speech Commands labels:
`yes`, `no`, `up`, `down`, `left`, `right`, `on`, `off`, `stop`, `go`, `zero`, `one`, `two`, `three`, `four`, `five`, `six`, `seven`, `eight`, `nine`, `bed`, `bird`, `cat`, `dog`, `happy`, `house`, `marvin`, `sheila`, `tree`, `wow`, `backward`, `forward`, `follow`, `learn`, `visual`
## Inference Notes
The model outputs logits over the 47 labels listed in `labels.json`. Use the output index to look up the predicted command label.
## Training Provenance
| Field | Value |
|---|---|
| Model name | `commandrecognition_en_matchboxnet3x2x64_v2` |
| Export format | ONNX |
| Epochs | 10 |
| Batch size | 32 |
## Limitations
- This is a closed-vocabulary command recognizer, not a general speech-to-text model.
- The model is intended for English short-command recognition.
- Validation metrics may not fully predict performance with every microphone, speaker, accent, room, or noise condition.