Akash Manohar

update readme frontmatter

a938cff 18 days ago

2.97 kB

license: apache-2.0
library_name: nemo
tags:
  - onnx
  - nemo
  - speech-commands
  - wake-word-spotting
datasets:
  - HashNuke/tincan-wakewords-data
metrics:
  - accuracy
model-index:
  - name: TinCan Speech Commands Model
    results:
      - task:
          type: audio-classification
          name: Speech command recognition
        dataset:
          name: TinCan Speech Commands validation set
          type: tincan-speech-commands-validation
        metrics:
          - type: loss
            name: Validation loss
            value: 0.1493
          - type: accuracy
            name: Validation micro top-1 accuracy
            value: 95.28
          - type: accuracy
            name: Validation macro accuracy
            value: 94.61

TinCan Speech Commands Model

A compact English speech-command recognition model for tincan app.

This model recognizes 47 short command classes and is designed for small-footprint command recognition where cloud ASR is unnecessary or undesirable. The exported ONNX artifact is under 400 KB, making it practical for local-first applications, prototypes, and edge deployments.

12 custom words
and 35 words from the Google Speech Commands dataset v2

Highlights

47-class English command recognizer
ONNX export for portable inference
Small model artifact: model.onnx is approximately 378 KB
Based on NVIDIA NeMo's MatchboxNet command-recognition model family

Base Model

This model uses NVIDIA NeMo's commandrecognition_en_matchboxnet3x2x64_v2 MatchboxNet command-recognition architecture.

Base model reference: commandrecognition_en_matchboxnet3x2x64_v2

Metrics

These metrics describe the currently exported model.onnx artifact.

Metric	Value
Validation loss	0.1493
Validation micro top-1 accuracy	95.28%
Validation macro accuracy	94.61%

Supported Commands

Custom TinCan commands:

astra, bali, boston, capri, delhi, dublin, frisco, monaco, oslo, paris, seatown, tokyo

Google Speech Commands labels:

yes, no, up, down, left, right, on, off, stop, go, zero, one, two, three, four, five, six, seven, eight, nine, bed, bird, cat, dog, happy, house, marvin, sheila, tree, wow, backward, forward, follow, learn, visual

Inference Notes

The model outputs logits over the 47 labels listed in labels.json. Use the output index to look up the predicted command label.

Training Provenance

Field	Value
Model name	`commandrecognition_en_matchboxnet3x2x64_v2`
Export format	ONNX
Epochs	10
Batch size	32

Limitations

This is a closed-vocabulary command recognizer, not a general speech-to-text model.
The model is intended for English short-command recognition.
Validation metrics may not fully predict performance with every microphone, speaker, accent, room, or noise condition.