Real-Time Audio Event Detection for Threat Assessment on Mobile Devices

White Paper: Edge Deployment, Model Architectures, and Multi-Modal Fusion with Wearable Physiological Signals

Author: Aditya Raikar


Overview

This repository contains a comprehensive research white paper on building a real-time audio event detection (AED) system for threat assessment on Android mobile devices. The paper covers:

  1. Edge vs. Cloud Deployment Analysis β€” Why edge-first deployment is optimal for threat detection (latency, privacy, offline operation)
  2. Model Architecture Taxonomy β€” From ultra-lightweight (6K params) to SOTA (90M params), with published AudioSet benchmarks
  3. Multi-Modal Fusion β€” Novel proposition to combine audio threat detection with smartwatch physiological signals (HR, HRV, EDA, accelerometer) for false-positive reduction
  4. Android System Architecture β€” Five-layer design with component-level specifications and latency budget
  5. Open-Source Prototypes & Datasets β€” Curated list of available resources for development

Key Findings

Aspect Recommendation
Deployment Edge-first with optional cloud fallback
Primary Model EfficientAT-MN40 (~4M params, 0.47 mAP AudioSet)
Alternative YAMNet via MediaPipe (production-ready, TFLite native)
Fusion Late fusion + attention gating with smartwatch vitals
Latency Target < 100ms end-to-end (achievable: ~50-75ms)
Inference Engine TensorFlow Lite with GPU/NNAPI delegate

Novel Contribution

No prior work has combined real-time audio event detection with smartwatch physiological signals specifically for threat assessment. This paper proposes using involuntary physiological responses (startle reflex, fight-or-flight) as confirmation signals for audio-detected threats.

Key References

Paper Contribution ArXiv
EfficientAT (Schmid et al., 2023) SOTA efficient audio tagging via distillation ICASSP 2023
BEATs (Chen et al., 2023) 0.507 mAP AudioSet with acoustic tokenizers ICML 2023
AST (Gong et al., 2021) Pure-attention audio classifier 2104.01778
PANNs (Kong et al., 2020) Large-scale pretrained audio CNNs 1912.10211
Gunshot Detection (2025) CNN-based firearm classification 2506.20609
WESAD Cross-Modality (2025) 99.95% stress detection from wearables 2502.18733
CognitiveEMS (2024) Multi-modal emergency assistant on edge 2403.06734
Cross-Modal Violence (2024) Audio-visual anomaly detection fusion 2412.20455

Open-Source Prototypes Referenced

How to Compile the Paper

pdflatex whitepaper.tex
pdflatex whitepaper.tex  # Run twice for TOC and references

Requires: texlive-latex-extra, texlive-pictures (for TikZ diagrams)

License

This research white paper is provided for educational and research purposes.

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "AdityaRaikar/threat-detection-audio-whitepaper"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for AdityaRaikar/threat-detection-audio-whitepaper