Instructions to use HashNuke/tincan-wakewords with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use HashNuke/tincan-wakewords with NeMo:
# tag did not correspond to a valid NeMo domain.
- Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: nemo | |
| tags: | |
| - onnx | |
| - nemo | |
| - speech-commands | |
| - wake-word-spotting | |
| datasets: | |
| - HashNuke/tincan-wakewords-data | |
| metrics: | |
| - accuracy | |
| model-index: | |
| - name: TinCan Speech Commands Model | |
| results: | |
| - task: | |
| type: audio-classification | |
| name: Speech command recognition | |
| dataset: | |
| name: TinCan Speech Commands validation set | |
| type: tincan-speech-commands-validation | |
| metrics: | |
| - type: loss | |
| name: Validation loss | |
| value: 0.1493 | |
| - type: accuracy | |
| name: Validation micro top-1 accuracy | |
| value: 95.28 | |
| - type: accuracy | |
| name: Validation macro accuracy | |
| value: 94.61 | |
| # TinCan Speech Commands Model | |
| A compact English speech-command recognition model for tincan app. | |
| This model recognizes 47 short command classes and is designed for small-footprint command recognition where cloud ASR is unnecessary or undesirable. The exported ONNX artifact is under 400 KB, making it practical for local-first applications, prototypes, and edge deployments. | |
| * 12 custom words | |
| * and 35 words from the Google Speech Commands dataset v2 | |
| ## Highlights | |
| - 47-class English command recognizer | |
| - ONNX export for portable inference | |
| - Small model artifact: `model.onnx` is approximately 378 KB | |
| - Based on NVIDIA NeMo's MatchboxNet command-recognition model family | |
| ## Base Model | |
| This model uses NVIDIA NeMo's `commandrecognition_en_matchboxnet3x2x64_v2` MatchboxNet command-recognition architecture. | |
| Base model reference: [`commandrecognition_en_matchboxnet3x2x64_v2`](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/commandrecognition_en_matchboxnet3x2x64_v2) | |
| ## Metrics | |
| These metrics describe the currently exported `model.onnx` artifact. | |
| | Metric | Value | | |
| |---|---:| | |
| | Validation loss | 0.1493 | | |
| | Validation micro top-1 accuracy | 95.28% | | |
| | Validation macro accuracy | 94.61% | | |
| ## Supported Commands | |
| Custom TinCan commands: | |
| `astra`, `bali`, `boston`, `capri`, `delhi`, `dublin`, `frisco`, `monaco`, `oslo`, `paris`, `seatown`, `tokyo` | |
| Google Speech Commands labels: | |
| `yes`, `no`, `up`, `down`, `left`, `right`, `on`, `off`, `stop`, `go`, `zero`, `one`, `two`, `three`, `four`, `five`, `six`, `seven`, `eight`, `nine`, `bed`, `bird`, `cat`, `dog`, `happy`, `house`, `marvin`, `sheila`, `tree`, `wow`, `backward`, `forward`, `follow`, `learn`, `visual` | |
| ## Inference Notes | |
| The model outputs logits over the 47 labels listed in `labels.json`. Use the output index to look up the predicted command label. | |
| ## Training Provenance | |
| | Field | Value | | |
| |---|---| | |
| | Model name | `commandrecognition_en_matchboxnet3x2x64_v2` | | |
| | Export format | ONNX | | |
| | Epochs | 10 | | |
| | Batch size | 32 | | |
| ## Limitations | |
| - This is a closed-vocabulary command recognizer, not a general speech-to-text model. | |
| - The model is intended for English short-command recognition. | |
| - Validation metrics may not fully predict performance with every microphone, speaker, accent, room, or noise condition. | |