HashNuke
/

tincan-wakewords

speech-commands

wake-word-spotting

Eval Results (legacy)

Model card Files Files and versions

Akash Manohar commited on about 1 month ago

Commit

0795bcf

·

1 Parent(s): 6ef900d

add readme

Files changed (1) hide show

README.md +64 -3

README.md CHANGED Viewed

@@ -1,3 +1,64 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# TinCan Speech Commands Model
+A compact English speech-command recognition model for tincan app.
+This model recognizes 47 short command classes and is designed for small-footprint command recognition where cloud ASR is unnecessary or undesirable. The exported ONNX artifact is under 400 KB, making it practical for local-first applications, prototypes, and edge deployments.
+* 12 custom words
+* and 35 words from the Google Speech Commands dataset v2
+## Highlights
+- 47-class English command recognizer
+- ONNX export for portable inference
+- Small model artifact: `model.onnx` is approximately 378 KB
+- Based on NVIDIA NeMo's MatchboxNet command-recognition model family
+## Base Model
+This model uses NVIDIA NeMo's `commandrecognition_en_matchboxnet3x2x64_v2` MatchboxNet command-recognition architecture.
+Base model reference: [`commandrecognition_en_matchboxnet3x2x64_v2`](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/commandrecognition_en_matchboxnet3x2x64_v2)
+## Metrics
+These metrics describe the currently exported `model.onnx` artifact.
+| Metric | Value |
+|---|---:|
+| Validation loss | 0.1493 |
+| Validation micro top-1 accuracy | 95.28% |
+| Validation macro accuracy | 94.61% |
+## Supported Commands
+Custom TinCan commands:
+`astra`, `bali`, `boston`, `capri`, `delhi`, `dublin`, `frisco`, `monaco`, `oslo`, `paris`, `seatown`, `tokyo`
+Google Speech Commands labels:
+`yes`, `no`, `up`, `down`, `left`, `right`, `on`, `off`, `stop`, `go`, `zero`, `one`, `two`, `three`, `four`, `five`, `six`, `seven`, `eight`, `nine`, `bed`, `bird`, `cat`, `dog`, `happy`, `house`, `marvin`, `sheila`, `tree`, `wow`, `backward`, `forward`, `follow`, `learn`, `visual`
+## Inference Notes
+The model outputs logits over the 47 labels listed in `labels.json`. Use the output index to look up the predicted command label.
+## Training Provenance
+| Field | Value |
+|---|---|
+| Model name | `commandrecognition_en_matchboxnet3x2x64_v2` |
+| Export format | ONNX |
+| Epochs | 10 |
+| Batch size | 32 |
+## Limitations
+- This is a closed-vocabulary command recognizer, not a general speech-to-text model.
+- The model is intended for English short-command recognition.
+- Validation metrics may not fully predict performance with every microphone, speaker, accent, room, or noise condition.