tincan-wakewords / README.md
Akash Manohar
update readme frontmatter
a938cff
metadata
license: apache-2.0
library_name: nemo
tags:
  - onnx
  - nemo
  - speech-commands
  - wake-word-spotting
datasets:
  - HashNuke/tincan-wakewords-data
metrics:
  - accuracy
model-index:
  - name: TinCan Speech Commands Model
    results:
      - task:
          type: audio-classification
          name: Speech command recognition
        dataset:
          name: TinCan Speech Commands validation set
          type: tincan-speech-commands-validation
        metrics:
          - type: loss
            name: Validation loss
            value: 0.1493
          - type: accuracy
            name: Validation micro top-1 accuracy
            value: 95.28
          - type: accuracy
            name: Validation macro accuracy
            value: 94.61

TinCan Speech Commands Model

A compact English speech-command recognition model for tincan app.

This model recognizes 47 short command classes and is designed for small-footprint command recognition where cloud ASR is unnecessary or undesirable. The exported ONNX artifact is under 400 KB, making it practical for local-first applications, prototypes, and edge deployments.

  • 12 custom words
  • and 35 words from the Google Speech Commands dataset v2

Highlights

  • 47-class English command recognizer
  • ONNX export for portable inference
  • Small model artifact: model.onnx is approximately 378 KB
  • Based on NVIDIA NeMo's MatchboxNet command-recognition model family

Base Model

This model uses NVIDIA NeMo's commandrecognition_en_matchboxnet3x2x64_v2 MatchboxNet command-recognition architecture.

Base model reference: commandrecognition_en_matchboxnet3x2x64_v2

Metrics

These metrics describe the currently exported model.onnx artifact.

Metric Value
Validation loss 0.1493
Validation micro top-1 accuracy 95.28%
Validation macro accuracy 94.61%

Supported Commands

Custom TinCan commands:

astra, bali, boston, capri, delhi, dublin, frisco, monaco, oslo, paris, seatown, tokyo

Google Speech Commands labels:

yes, no, up, down, left, right, on, off, stop, go, zero, one, two, three, four, five, six, seven, eight, nine, bed, bird, cat, dog, happy, house, marvin, sheila, tree, wow, backward, forward, follow, learn, visual

Inference Notes

The model outputs logits over the 47 labels listed in labels.json. Use the output index to look up the predicted command label.

Training Provenance

Field Value
Model name commandrecognition_en_matchboxnet3x2x64_v2
Export format ONNX
Epochs 10
Batch size 32

Limitations

  • This is a closed-vocabulary command recognizer, not a general speech-to-text model.
  • The model is intended for English short-command recognition.
  • Validation metrics may not fully predict performance with every microphone, speaker, accent, room, or noise condition.