Akash Manohar commited on
Commit
0795bcf
·
1 Parent(s): 6ef900d

add readme

Browse files
Files changed (1) hide show
  1. README.md +64 -3
README.md CHANGED
@@ -1,3 +1,64 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # TinCan Speech Commands Model
6
+
7
+ A compact English speech-command recognition model for tincan app.
8
+
9
+ This model recognizes 47 short command classes and is designed for small-footprint command recognition where cloud ASR is unnecessary or undesirable. The exported ONNX artifact is under 400 KB, making it practical for local-first applications, prototypes, and edge deployments.
10
+
11
+ * 12 custom words
12
+ * and 35 words from the Google Speech Commands dataset v2
13
+
14
+ ## Highlights
15
+
16
+ - 47-class English command recognizer
17
+ - ONNX export for portable inference
18
+ - Small model artifact: `model.onnx` is approximately 378 KB
19
+ - Based on NVIDIA NeMo's MatchboxNet command-recognition model family
20
+
21
+ ## Base Model
22
+
23
+ This model uses NVIDIA NeMo's `commandrecognition_en_matchboxnet3x2x64_v2` MatchboxNet command-recognition architecture.
24
+
25
+ Base model reference: [`commandrecognition_en_matchboxnet3x2x64_v2`](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/commandrecognition_en_matchboxnet3x2x64_v2)
26
+
27
+ ## Metrics
28
+
29
+ These metrics describe the currently exported `model.onnx` artifact.
30
+
31
+ | Metric | Value |
32
+ |---|---:|
33
+ | Validation loss | 0.1493 |
34
+ | Validation micro top-1 accuracy | 95.28% |
35
+ | Validation macro accuracy | 94.61% |
36
+
37
+ ## Supported Commands
38
+
39
+ Custom TinCan commands:
40
+
41
+ `astra`, `bali`, `boston`, `capri`, `delhi`, `dublin`, `frisco`, `monaco`, `oslo`, `paris`, `seatown`, `tokyo`
42
+
43
+ Google Speech Commands labels:
44
+
45
+ `yes`, `no`, `up`, `down`, `left`, `right`, `on`, `off`, `stop`, `go`, `zero`, `one`, `two`, `three`, `four`, `five`, `six`, `seven`, `eight`, `nine`, `bed`, `bird`, `cat`, `dog`, `happy`, `house`, `marvin`, `sheila`, `tree`, `wow`, `backward`, `forward`, `follow`, `learn`, `visual`
46
+
47
+ ## Inference Notes
48
+
49
+ The model outputs logits over the 47 labels listed in `labels.json`. Use the output index to look up the predicted command label.
50
+
51
+ ## Training Provenance
52
+
53
+ | Field | Value |
54
+ |---|---|
55
+ | Model name | `commandrecognition_en_matchboxnet3x2x64_v2` |
56
+ | Export format | ONNX |
57
+ | Epochs | 10 |
58
+ | Batch size | 32 |
59
+
60
+ ## Limitations
61
+
62
+ - This is a closed-vocabulary command recognizer, not a general speech-to-text model.
63
+ - The model is intended for English short-command recognition.
64
+ - Validation metrics may not fully predict performance with every microphone, speaker, accent, room, or noise condition.