HashNuke
/

tincan-wakewords

speech-commands

wake-word-spotting

Eval Results (legacy)

Model card Files Files and versions

tincan-wakewords / README.md

Akash Manohar

update readme frontmatter

a938cff 20 days ago

|

history blame contribute delete

2.97 kB

	---
	license: apache-2.0
	library_name: nemo
	tags:
	- onnx
	- nemo
	- speech-commands
	- wake-word-spotting
	datasets:
	- HashNuke/tincan-wakewords-data
	metrics:
	- accuracy
	model-index:
	- name: TinCan Speech Commands Model
	results:
	- task:
	type: audio-classification
	name: Speech command recognition
	dataset:
	name: TinCan Speech Commands validation set
	type: tincan-speech-commands-validation
	metrics:
	- type: loss
	name: Validation loss
	value: 0.1493
	- type: accuracy
	name: Validation micro top-1 accuracy
	value: 95.28
	- type: accuracy
	name: Validation macro accuracy
	value: 94.61
	---

	# TinCan Speech Commands Model

	A compact English speech-command recognition model for tincan app.

	This model recognizes 47 short command classes and is designed for small-footprint command recognition where cloud ASR is unnecessary or undesirable. The exported ONNX artifact is under 400 KB, making it practical for local-first applications, prototypes, and edge deployments.

	* 12 custom words
	* and 35 words from the Google Speech Commands dataset v2

	## Highlights

	- 47-class English command recognizer
	- ONNX export for portable inference
	- Small model artifact: `model.onnx` is approximately 378 KB
	- Based on NVIDIA NeMo's MatchboxNet command-recognition model family

	## Base Model

	This model uses NVIDIA NeMo's `commandrecognition_en_matchboxnet3x2x64_v2` MatchboxNet command-recognition architecture.

	Base model reference: [`commandrecognition_en_matchboxnet3x2x64_v2`](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/commandrecognition_en_matchboxnet3x2x64_v2)

	## Metrics

	These metrics describe the currently exported `model.onnx` artifact.

	\| Metric \| Value \|
	\|---\|---:\|
	\| Validation loss \| 0.1493 \|
	\| Validation micro top-1 accuracy \| 95.28% \|
	\| Validation macro accuracy \| 94.61% \|

	## Supported Commands

	Custom TinCan commands:

	`astra`, `bali`, `boston`, `capri`, `delhi`, `dublin`, `frisco`, `monaco`, `oslo`, `paris`, `seatown`, `tokyo`

	Google Speech Commands labels:

	`yes`, `no`, `up`, `down`, `left`, `right`, `on`, `off`, `stop`, `go`, `zero`, `one`, `two`, `three`, `four`, `five`, `six`, `seven`, `eight`, `nine`, `bed`, `bird`, `cat`, `dog`, `happy`, `house`, `marvin`, `sheila`, `tree`, `wow`, `backward`, `forward`, `follow`, `learn`, `visual`

	## Inference Notes

	The model outputs logits over the 47 labels listed in `labels.json`. Use the output index to look up the predicted command label.

	## Training Provenance

	\| Field \| Value \|
	\|---\|---\|
	\| Model name \| `commandrecognition_en_matchboxnet3x2x64_v2` \|
	\| Export format \| ONNX \|
	\| Epochs \| 10 \|
	\| Batch size \| 32 \|

	## Limitations

	- This is a closed-vocabulary command recognizer, not a general speech-to-text model.
	- The model is intended for English short-command recognition.
	- Validation metrics may not fully predict performance with every microphone, speaker, accent, room, or noise condition.