Real-Time Audio Event Detection for Threat Assessment on Mobile Devices

White Paper: Edge Deployment, Model Architectures, and Multi-Modal Fusion with Wearable Physiological Signals

Author: Aditya Raikar

Overview

This repository contains a comprehensive research white paper on building a real-time audio event detection (AED) system for threat assessment on Android mobile devices. The paper covers:

Edge vs. Cloud Deployment Analysis — Why edge-first deployment is optimal for threat detection (latency, privacy, offline operation)
Model Architecture Taxonomy — From ultra-lightweight (6K params) to SOTA (90M params), with published AudioSet benchmarks
Multi-Modal Fusion — Novel proposition to combine audio threat detection with smartwatch physiological signals (HR, HRV, EDA, accelerometer) for false-positive reduction
Android System Architecture — Five-layer design with component-level specifications and latency budget
Open-Source Prototypes & Datasets — Curated list of available resources for development

Key Findings

Aspect	Recommendation
Deployment	Edge-first with optional cloud fallback
Primary Model	EfficientAT-MN40 (~4M params, 0.47 mAP AudioSet)
Alternative	YAMNet via MediaPipe (production-ready, TFLite native)
Fusion	Late fusion + attention gating with smartwatch vitals
Latency Target	< 100ms end-to-end (achievable: ~50-75ms)
Inference Engine	TensorFlow Lite with GPU/NNAPI delegate

Novel Contribution

No prior work has combined real-time audio event detection with smartwatch physiological signals specifically for threat assessment. This paper proposes using involuntary physiological responses (startle reflex, fight-or-flight) as confirmation signals for audio-detected threats.

Key References

Paper	Contribution	ArXiv
EfficientAT (Schmid et al., 2023)	SOTA efficient audio tagging via distillation	ICASSP 2023
BEATs (Chen et al., 2023)	0.507 mAP AudioSet with acoustic tokenizers	ICML 2023
AST (Gong et al., 2021)	Pure-attention audio classifier	2104.01778
PANNs (Kong et al., 2020)	Large-scale pretrained audio CNNs	1912.10211
Gunshot Detection (2025)	CNN-based firearm classification	2506.20609
WESAD Cross-Modality (2025)	99.95% stress detection from wearables	2502.18733
CognitiveEMS (2024)	Multi-modal emergency assistant on edge	2403.06734
Cross-Modal Violence (2024)	Audio-visual anomaly detection fusion	2412.20455

Open-Source Prototypes Referenced

vivsvaan/Gunshot-Detection — CNN gunshot classifier
fschmid56/EfficientAT — Efficient audio tagging models
TensorFlow Audio Classification Android — TFLite demo app
MediaPipe Audio Classifier — Production API
microsoft/unilm/beats — BEATs SOTA model
WJMatthew/WESAD — Wearable stress detection

How to Compile the Paper

pdflatex whitepaper.tex
pdflatex whitepaper.tex  # Run twice for TOC and references

Requires: texlive-latex-extra, texlive-pictures (for TikZ diagrams)

License

This research white paper is provided for educational and research purposes.

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "AdityaRaikar/threat-detection-audio-whitepaper"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for AdityaRaikar/threat-detection-audio-whitepaper

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings

Paper • 2506.20609 • Published Jun 25, 2025

Cross-Modality Investigation on WESAD Stress Classification

Paper • 2502.18733 • Published Feb 26, 2025

Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection

Paper • 2412.20455 • Published Dec 29, 2024 • 9

Real-Time Multimodal Cognitive Assistant for Emergency Medical Services

Paper • 2403.06734 • Published Mar 11, 2024

AST: Audio Spectrogram Transformer

Paper • 2104.01778 • Published Apr 5, 2021 • 3