On-device Acoustic Alert System - 5-Class Baseline

This repository contains the current lightweight CNN baseline for a Raspberry Pi on-device acoustic alert system.

Classes

background
dog_bark
alarm_siren
door_event
scream_shout

baby_cry and glass_break were removed from this baseline because the available data was insufficient or unstable for a reliable first deployable model.

Model

Framework: TensorFlow/Keras
Architecture: Lightweight CNN
Input: normalized Log-Mel Spectrogram, shape (64, 64, 1)
Parameters: about 28k
Intended deployment path: TensorFlow Lite / INT8 quantization in the next project stage

Training Data

The current 5-class baseline was built from three public datasets:

dataset	role
ESC-50	Curated environmental sound examples for background, dog sounds, alarm-like sounds, and door knocks.
UrbanSound8K	Additional urban audio for `dog_bark`, `alarm_siren`, and diverse background examples.
FSD50K	Larger Freesound-based expansion for `dog_bark`, `alarm_siren`, `door_event`, `scream_shout`, and background.

Only directly mapped labels were used. Ambiguous FSD50K clips with multiple project target labels were excluded.

Processed sample counts:

split	background	dog_bark	alarm_siren	door_event	scream_shout	total
train	18409	1175	1898	250	373	22105
val	2178	170	184	52	68	2652
test	3831	226	635	111	276	5079

Feature Extraction

Training and inference must use the same parameters:

sample_rate = 16000
clip_seconds = 2
n_mels = 64
n_fft = 1024
hop_length = 512

Normalization parameters are stored in normalization.json.

Evaluation

Test results for best_model.keras:

metric	value
accuracy	0.6954
macro F1	0.5402
mean inference latency	43.88 ms

Per-class F1:

class	F1
background	0.7962
dog_bark	0.4806
alarm_siren	0.5023
door_event	0.4579
scream_shout	0.4638

Confidence-threshold behavior:

threshold	accuracy	macro F1	background false-alert rate	target detection rate
0.65	0.7803	0.6105	0.1378	0.5801
0.75	0.7970	0.6191	0.0976	0.5096
0.80	0.8035	0.6184	0.0788	0.4720

For long-running alerting, use thresholding plus alert smoothing and cooldown.

Files

best_model.keras: best validation checkpoint
final_model.keras: final training checkpoint
label_map.json: label order used by model outputs
normalization.json: feature normalization and extraction parameters
metrics.json: test metrics
classification_report.txt: per-class report
confidence_threshold_sweep.json: threshold trade-off table
confusion_matrix.png: confusion matrix plot
training_summary.json: training run summary

Limitations

This is a course-project baseline, not a production safety system.

It does not save or upload raw audio.
It does not perform speech recognition.
It detects sound event categories only.
Performance on Raspberry Pi microphone audio still needs device-side validation.

Downloads last month: 6

CaptainRapid
/

acoustic-alert-5class