PaulPlayStudio commited on
Commit
edf3014
·
verified ·
1 Parent(s): 7869a26

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -3
README.md CHANGED
@@ -1,3 +1,57 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - audio
5
+ - classification
6
+ - alarm
7
+ - siren
8
+ - ambulance
9
+ - police
10
+ - security
11
+ ---
12
+
13
+ # audio-alert-detector
14
+
15
+ Tiny CNN that classifies a 10-second audio clip as alert (siren, smoke/fire alarm, car alarm, house/burglar alarm) vs not-alert.
16
+ Designed to run on a Raspberry Pi Zero 2 in the ONNX runtime with no accelerator.
17
+
18
+ ## Model
19
+
20
+ - Depthwise-separable CNN (~35k params, ~17M MACs)
21
+ - Input: raw mono 16 kHz PCM, 10 s = 160k samples
22
+ - Embedded log-mel frontend (no host-side preprocessing needed beyond resampling to 16kHz)
23
+ - Two heads: binary (alert/not) + auxiliary subclass (siren/alarm)
24
+ - Δ + ΔΔ time-derivative input channels for onset/sweep dynamics
25
+
26
+ ## Training data
27
+
28
+ - ~11k positives + ~60k negatives, 10 s each, 16 kHz mono
29
+ - AudioSet via [confit/audioset-full](https://huggingface.co/datasets/confit/audioset-full) HF mirror
30
+ - Targeted hard-negative mid lists (bells, whistles, woodwinds, mechanical, instruments, animals, music) for known FP categories
31
+ - Curated supplemental positives: ~199 country-specific EAS alarms + ~107 nuclear/civil-defense sirens
32
+
33
+ ## Training
34
+
35
+ - 40 epochs, AdamW (lr 3e-4, wd 1e-4), cosine LR schedule
36
+ - Batch 64, fixed 40% positive fraction per batch (uniform within each pool)
37
+ - Loss: binary BCE + 0.3 × subclass CE (masked to positives)
38
+
39
+ ## Augmentation (mel-space)
40
+
41
+ - Random time-stretch (0.9–1.1×)
42
+ - Random gain (–45 to +15 dB)
43
+ - Frequency shift (±4 mel bins ≈ ±2 semitones)
44
+ - Companding (γ 0.75–1.25, p=0.3)
45
+ - SpecAugment time + freq masks
46
+ - Curated ambience overlay (rain / cafe / road traffic / mic noise floor / etc.) at 25% RMS, applied to both classes
47
+
48
+ ## Deployment
49
+
50
+ Single-file fp32 ONNX (348 KB). Input: `float32[batch, 160000]` raw 16 kHz mono PCM. Outputs: `binary_logit` + `subclass_logits[2]`. Apply sigmoid for alert probability.
51
+
52
+ Recommended: 10s ring buffer, infer every 2 s, threshold 0.5, require 2 consecutive over-threshold windows to fire (eliminates almost all single-window FPs).
53
+
54
+ ## Performance
55
+
56
+ - Test set (225 curated clips): 88% acc / 89% prec / 83% rec / F1 0.86
57
+ - Pi Zero 2 (Cortex-A53, fp32, single thread): ~150 ms / 10 s window