Instructions to use CaptainRapid/acoustic-alert-5class with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use CaptainRapid/acoustic-alert-5class with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://CaptainRapid/acoustic-alert-5class") - Notebooks
- Google Colab
- Kaggle
On-device Acoustic Alert System - 5-Class Baseline
This repository contains the current lightweight CNN baseline for a Raspberry Pi on-device acoustic alert system.
Classes
background
dog_bark
alarm_siren
door_event
scream_shout
baby_cry and glass_break were removed from this baseline because the available data was insufficient or unstable for a reliable first deployable model.
Model
- Framework: TensorFlow/Keras
- Architecture: Lightweight CNN
- Input: normalized Log-Mel Spectrogram, shape
(64, 64, 1) - Parameters: about 28k
- Intended deployment path: TensorFlow Lite / INT8 quantization in the next project stage
Training Data
The current 5-class baseline was built from three public datasets:
| dataset | role |
|---|---|
| ESC-50 | Curated environmental sound examples for background, dog sounds, alarm-like sounds, and door knocks. |
| UrbanSound8K | Additional urban audio for dog_bark, alarm_siren, and diverse background examples. |
| FSD50K | Larger Freesound-based expansion for dog_bark, alarm_siren, door_event, scream_shout, and background. |
Only directly mapped labels were used. Ambiguous FSD50K clips with multiple project target labels were excluded.
Processed sample counts:
| split | background | dog_bark | alarm_siren | door_event | scream_shout | total |
|---|---|---|---|---|---|---|
| train | 18409 | 1175 | 1898 | 250 | 373 | 22105 |
| val | 2178 | 170 | 184 | 52 | 68 | 2652 |
| test | 3831 | 226 | 635 | 111 | 276 | 5079 |
Feature Extraction
Training and inference must use the same parameters:
sample_rate = 16000
clip_seconds = 2
n_mels = 64
n_fft = 1024
hop_length = 512
Normalization parameters are stored in normalization.json.
Evaluation
Test results for best_model.keras:
| metric | value |
|---|---|
| accuracy | 0.6954 |
| macro F1 | 0.5402 |
| mean inference latency | 43.88 ms |
Per-class F1:
| class | F1 |
|---|---|
| background | 0.7962 |
| dog_bark | 0.4806 |
| alarm_siren | 0.5023 |
| door_event | 0.4579 |
| scream_shout | 0.4638 |
Confidence-threshold behavior:
| threshold | accuracy | macro F1 | background false-alert rate | target detection rate |
|---|---|---|---|---|
| 0.65 | 0.7803 | 0.6105 | 0.1378 | 0.5801 |
| 0.75 | 0.7970 | 0.6191 | 0.0976 | 0.5096 |
| 0.80 | 0.8035 | 0.6184 | 0.0788 | 0.4720 |
For long-running alerting, use thresholding plus alert smoothing and cooldown.
Files
best_model.keras: best validation checkpointfinal_model.keras: final training checkpointlabel_map.json: label order used by model outputsnormalization.json: feature normalization and extraction parametersmetrics.json: test metricsclassification_report.txt: per-class reportconfidence_threshold_sweep.json: threshold trade-off tableconfusion_matrix.png: confusion matrix plottraining_summary.json: training run summary
Limitations
This is a course-project baseline, not a production safety system.
- It does not save or upload raw audio.
- It does not perform speech recognition.
- It detects sound event categories only.
- Performance on Raspberry Pi microphone audio still needs device-side validation.
- Downloads last month
- 67