On-device Acoustic Alert System - 5-Class Baseline

This repository contains the current lightweight CNN baseline for a Raspberry Pi on-device acoustic alert system.

Classes

background
dog_bark
alarm_siren
door_event
scream_shout

baby_cry and glass_break were removed from this baseline because the available data was insufficient or unstable for a reliable first deployable model.

Model

  • Framework: TensorFlow/Keras
  • Architecture: Lightweight CNN
  • Input: normalized Log-Mel Spectrogram, shape (64, 64, 1)
  • Parameters: about 28k
  • Intended deployment path: TensorFlow Lite / INT8 quantization in the next project stage

Training Data

The current 5-class baseline was built from three public datasets:

dataset role
ESC-50 Curated environmental sound examples for background, dog sounds, alarm-like sounds, and door knocks.
UrbanSound8K Additional urban audio for dog_bark, alarm_siren, and diverse background examples.
FSD50K Larger Freesound-based expansion for dog_bark, alarm_siren, door_event, scream_shout, and background.

Only directly mapped labels were used. Ambiguous FSD50K clips with multiple project target labels were excluded.

Processed sample counts:

split background dog_bark alarm_siren door_event scream_shout total
train 18409 1175 1898 250 373 22105
val 2178 170 184 52 68 2652
test 3831 226 635 111 276 5079

Feature Extraction

Training and inference must use the same parameters:

sample_rate = 16000
clip_seconds = 2
n_mels = 64
n_fft = 1024
hop_length = 512

Normalization parameters are stored in normalization.json.

Evaluation

Test results for best_model.keras:

metric value
accuracy 0.6954
macro F1 0.5402
mean inference latency 43.88 ms

Per-class F1:

class F1
background 0.7962
dog_bark 0.4806
alarm_siren 0.5023
door_event 0.4579
scream_shout 0.4638

Confidence-threshold behavior:

threshold accuracy macro F1 background false-alert rate target detection rate
0.65 0.7803 0.6105 0.1378 0.5801
0.75 0.7970 0.6191 0.0976 0.5096
0.80 0.8035 0.6184 0.0788 0.4720

For long-running alerting, use thresholding plus alert smoothing and cooldown.

Files

  • best_model.keras: best validation checkpoint
  • final_model.keras: final training checkpoint
  • label_map.json: label order used by model outputs
  • normalization.json: feature normalization and extraction parameters
  • metrics.json: test metrics
  • classification_report.txt: per-class report
  • confidence_threshold_sweep.json: threshold trade-off table
  • confusion_matrix.png: confusion matrix plot
  • training_summary.json: training run summary

Limitations

This is a course-project baseline, not a production safety system.

  • It does not save or upload raw audio.
  • It does not perform speech recognition.
  • It detects sound event categories only.
  • Performance on Raspberry Pi microphone audio still needs device-side validation.
Downloads last month
67
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train CaptainRapid/acoustic-alert-5class