DroneClassifier β€” Real-Time Drone Audio Detection

A lightweight CRNN (Convolutional Recurrent Neural Network) for binary drone audio detection. Runs in real time on a Raspberry Pi CM4 and produces a detection decision every 500 ms.

Trained on geronimobasso/drone-audio-detection-samples β€” 180 320 clips, 16 kHz mono.

Model Variants

Three checkpoint variants are included:

File Augmentation Recommended use
drone_classifier_aug_mixed.pt Brown noise + real ESC-50 wind (50 / 50) General purpose (recommended)
drone_classifier_aug_pw10.pt Brown noise only Best at extreme real-wind SNR (βˆ’5 dB)
drone_classifier_baseline.pt None Clean-audio reference baseline

Performance

Clean test set (27 050 held-out samples)

Variant Accuracy Precision Recall F1 ROC-AUC
aug_mixed (recommended) 0.9993 0.9998 0.9995 0.9996 1.000
aug_pw10 0.9989 0.9996 0.9992 0.9994 0.9999
baseline 0.9980 1.0000 0.9978 0.9989 0.9997

Noise robustness β€” real ESC-50 wind clips (aug_mixed, never seen during training)

SNR Recall F1 Notes
+20 dB 1.000 1.000
+10 dB 0.996 0.998
+5 dB 0.996 0.998
0 dB 0.996 0.998
βˆ’5 dB 0.860 0.925 wind 1.8Γ— louder than drone

The baseline model collapses at 0 dB SNR (F1 = 0.353); aug_mixed stays at F1 β‰₯ 0.998 down to 0 dB.

Evaluation images are in the eval/ folder of this repo.

Usage

import torch
import torchaudio
import torchaudio.transforms as T
from huggingface_hub import hf_hub_download

# ── 1. Download weights ────────────────────────────────────────────────────
ckpt_path = hf_hub_download(
    repo_id="AntoineNaccache/drone-audio-detector",
    filename="drone_classifier_aug_mixed.pt",
)

# ── 2. Define or import the model ─────────────────────────────────────────
# Option A: download model.py from the repo and place it next to your script,
#           then:  from model import DroneClassifier, load_classifier
#
# Option B: inline definition (copy from model.py in this repo)
import torch.nn as nn

def _conv_block(in_ch, out_ch):
    return nn.Sequential(
        nn.Conv2d(in_ch, out_ch, 3, padding=1, bias=False), nn.BatchNorm2d(out_ch), nn.ReLU(True),
        nn.Conv2d(out_ch, out_ch, 3, padding=1, bias=False), nn.BatchNorm2d(out_ch), nn.ReLU(True),
    )

class DroneClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.enc1  = _conv_block(1, 32);   self.pool1 = nn.MaxPool2d(2, 2)
        self.enc2  = _conv_block(32, 64);  self.pool2 = nn.MaxPool2d(2, 2)
        self.enc3  = _conv_block(64, 128); self.pool3 = nn.MaxPool2d((2, 1), (2, 1))
        self.gru   = nn.GRU(1024, 128, num_layers=2, batch_first=True,
                            bidirectional=True, dropout=0.2)
        self.head  = nn.Sequential(nn.Linear(256, 64), nn.ReLU(True),
                                   nn.Dropout(0.3), nn.Linear(64, 1))
    def forward(self, x):
        x = self.pool1(self.enc1(x))
        x = self.pool2(self.enc2(x))
        x = self.pool3(self.enc3(x))
        B, C, F, T = x.shape
        x, _ = self.gru(x.permute(0, 3, 1, 2).reshape(B, T, C * F))
        return self.head(x.mean(1))

# ── 3. Load checkpoint ─────────────────────────────────────────────────────
model = DroneClassifier()
state = torch.load(ckpt_path, map_location="cpu", weights_only=False)
if "model_state_dict" in state:
    state = state["model_state_dict"]
model.load_state_dict(state, strict=False)
model.eval()

# ── 4. Prepare a 1-second audio chunk (16 kHz mono) ───────────────────────
waveform, sr = torchaudio.load("drone.wav")
if sr != 16_000:
    waveform = torchaudio.functional.resample(waveform, sr, 16_000)
waveform = waveform.mean(0, keepdim=True)[:, :16_000]   # mono, 1 s

mel_transform = T.MelSpectrogram(
    sample_rate=16_000, n_fft=512, hop_length=160,
    n_mels=64, f_min=50, f_max=5_500,
)
log_mel = (T.AmplitudeToDB()(mel_transform(waveform)) + 40) / 40  # β‰ˆ [-1, 1]
x = log_mel.unsqueeze(0)   # (1, 1, 64, T)

# ── 5. Infer ───────────────────────────────────────────────────────────────
with torch.no_grad():
    prob = torch.sigmoid(model(x)).item()

print(f"Drone probability: {prob:.3f}")
print("DRONE DETECTED" if prob >= 0.5 else "No drone")

For streaming (live microphone β†’ GPIO trigger) and full inference pipelines, see the source repository.

Architecture

Input: log-mel spectrogram (B, 1, 64, T)   T β‰ˆ 101 frames for 1 second
  β”‚
SharedEncoder
  β”œβ”€ ConvBlock(1β†’32)   MaxPool(2Γ—2)  β†’  (B, 32,  32, T/2)
  β”œβ”€ ConvBlock(32β†’64)  MaxPool(2Γ—2)  β†’  (B, 64,  16, T/2)
  β”œβ”€ ConvBlock(64β†’128) MaxPool(2Γ—1)  β†’  (B, 128,  8, T/2)
  └─ reshape β†’ (B, T/2, 1024)
       BiGRU(1024 β†’ 256, 2 layers, bidirectional, dropout=0.2)
       β†’ (B, T/2, 256)
  β”‚
ClassifierHead
  GlobalAvgPool(time) β†’ (B, 256)
  FC(256β†’64) β†’ ReLU β†’ Dropout(0.3) β†’ FC(64β†’1)  β†’  logit
Component Parameters Share
SharedEncoder β€” CNN 286 880 19.3%
SharedEncoder β€” BiGRU 1 182 720 79.6%
ClassifierHead 16 513 1.1%
Total 1 486 113

Checkpoint size: ~5.94 MB (FP32). Quantised INT8 ONNX: ~1.49 MB.

Audio Front-End

Parameter Value
Sample rate 16 kHz
FFT window Hann, 512 samples (32 ms)
Hop length 160 samples (10 ms)
Mel bins 64
Frequency range 50 – 5 500 Hz
Normalisation (AmplitudeToDB + 40) / 40
Chunk duration 1 second
Detection cadence 500 ms (50% overlap)

Training Details

  • Dataset: 180 320 clips, 16 kHz mono, stratified 70 / 15 / 15 split
  • Loss: BCEWithLogitsLoss(pos_weight=0.102) β€” compensates 10:1 drone-heavy imbalance
  • Optimiser: Adam, lr = 1e-3, weight decay = 1e-4, 30 epochs
  • Schedule: CosineAnnealingLR
  • Augmentation (aug_mixed): Brown noise (1/fΒ²) or real ESC-50 wind clips, random SNR 0 – 20 dB, p = 0.5

Limitations

  • Evaluated only on the geronimobasso/drone-audio-detection-samples distribution
  • Not validated against helicopters, fixed-wing aircraft, or other rotary-wing craft
  • Recall drops to ~86 – 89% when wind is 1.8Γ— louder than the drone (SNR < βˆ’5 dB)
  • Designed for 16 kHz microphone input; other sample rates require resampling

Citation

@misc{naccache2025droneclassifier,
  author    = {Antoine Naccache},
  title     = {DroneClassifier: Real-Time Drone Audio Detection with CRNN},
  year      = {2025},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/AntoineNaccache/drone-audio-detector}
}
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train AntoineNaccache/drone-audio-detector

Evaluation results

  • F1 Score (clean test set, 27 050 samples) on Drone Audio Detection Samples
    self-reported
    1.000
  • Accuracy (clean test set) on Drone Audio Detection Samples
    self-reported
    0.999
  • ROC AUC (clean test set) on Drone Audio Detection Samples
    self-reported
    1.000