Instructions to use HashNuke/tincan-wakewords with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use HashNuke/tincan-wakewords with NeMo:
# tag did not correspond to a valid NeMo domain.
- Notebooks
- Google Colab
- Kaggle
Akash Manohar commited on
Commit ·
0795bcf
1
Parent(s): 6ef900d
add readme
Browse files
README.md
CHANGED
|
@@ -1,3 +1,64 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# TinCan Speech Commands Model
|
| 6 |
+
|
| 7 |
+
A compact English speech-command recognition model for tincan app.
|
| 8 |
+
|
| 9 |
+
This model recognizes 47 short command classes and is designed for small-footprint command recognition where cloud ASR is unnecessary or undesirable. The exported ONNX artifact is under 400 KB, making it practical for local-first applications, prototypes, and edge deployments.
|
| 10 |
+
|
| 11 |
+
* 12 custom words
|
| 12 |
+
* and 35 words from the Google Speech Commands dataset v2
|
| 13 |
+
|
| 14 |
+
## Highlights
|
| 15 |
+
|
| 16 |
+
- 47-class English command recognizer
|
| 17 |
+
- ONNX export for portable inference
|
| 18 |
+
- Small model artifact: `model.onnx` is approximately 378 KB
|
| 19 |
+
- Based on NVIDIA NeMo's MatchboxNet command-recognition model family
|
| 20 |
+
|
| 21 |
+
## Base Model
|
| 22 |
+
|
| 23 |
+
This model uses NVIDIA NeMo's `commandrecognition_en_matchboxnet3x2x64_v2` MatchboxNet command-recognition architecture.
|
| 24 |
+
|
| 25 |
+
Base model reference: [`commandrecognition_en_matchboxnet3x2x64_v2`](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/commandrecognition_en_matchboxnet3x2x64_v2)
|
| 26 |
+
|
| 27 |
+
## Metrics
|
| 28 |
+
|
| 29 |
+
These metrics describe the currently exported `model.onnx` artifact.
|
| 30 |
+
|
| 31 |
+
| Metric | Value |
|
| 32 |
+
|---|---:|
|
| 33 |
+
| Validation loss | 0.1493 |
|
| 34 |
+
| Validation micro top-1 accuracy | 95.28% |
|
| 35 |
+
| Validation macro accuracy | 94.61% |
|
| 36 |
+
|
| 37 |
+
## Supported Commands
|
| 38 |
+
|
| 39 |
+
Custom TinCan commands:
|
| 40 |
+
|
| 41 |
+
`astra`, `bali`, `boston`, `capri`, `delhi`, `dublin`, `frisco`, `monaco`, `oslo`, `paris`, `seatown`, `tokyo`
|
| 42 |
+
|
| 43 |
+
Google Speech Commands labels:
|
| 44 |
+
|
| 45 |
+
`yes`, `no`, `up`, `down`, `left`, `right`, `on`, `off`, `stop`, `go`, `zero`, `one`, `two`, `three`, `four`, `five`, `six`, `seven`, `eight`, `nine`, `bed`, `bird`, `cat`, `dog`, `happy`, `house`, `marvin`, `sheila`, `tree`, `wow`, `backward`, `forward`, `follow`, `learn`, `visual`
|
| 46 |
+
|
| 47 |
+
## Inference Notes
|
| 48 |
+
|
| 49 |
+
The model outputs logits over the 47 labels listed in `labels.json`. Use the output index to look up the predicted command label.
|
| 50 |
+
|
| 51 |
+
## Training Provenance
|
| 52 |
+
|
| 53 |
+
| Field | Value |
|
| 54 |
+
|---|---|
|
| 55 |
+
| Model name | `commandrecognition_en_matchboxnet3x2x64_v2` |
|
| 56 |
+
| Export format | ONNX |
|
| 57 |
+
| Epochs | 10 |
|
| 58 |
+
| Batch size | 32 |
|
| 59 |
+
|
| 60 |
+
## Limitations
|
| 61 |
+
|
| 62 |
+
- This is a closed-vocabulary command recognizer, not a general speech-to-text model.
|
| 63 |
+
- The model is intended for English short-command recognition.
|
| 64 |
+
- Validation metrics may not fully predict performance with every microphone, speaker, accent, room, or noise condition.
|