File size: 7,929 Bytes

---
license: mit
language:
- en
base_model:
- qualcomm/RF-DETR
pipeline_tag: object-detection
tags:
- surveillance
- Threat_detection
---

# RF-DETR based Threat Detection Model

<a href="https://opensource.org/licenses/MIT">
    <img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License">
</a>
<a href="https://github.com/roboflow/rf-detr">
    <img src="https://img.shields.io/badge/RF--DETR-Nano-purple?logo=roboflow&logoColor=white" alt="Model">
</a>
<a href="#performance-metrics">
    <img src="https://img.shields.io/badge/mAP%4050-84.8%25-darkgreen?style=flat" alt="mAP">
</a>
<a href="https://github.com/subh-775/Threat_Detection_YOLO-vs-RF-DETR">
    <img src="https://img.shields.io/badge/-code-black?logo=github" alt="Code">
</a>


## Transformers for Object Detection

The paradigm has shifted! While CNNs traditionally dominated object detection with faster inference times, **RF-DETR** (Roboflow's Detection Transformer) has revolutionized the field. This transformer-based architecture not only **outperforms CNNs** in accuracy but also delivers **faster inference** for real-time applications.

This repository contains a **fine-tuned RF-DETR Nano model** specifically trained for **threat detection**, capable of identifying four critical threat categories with high precision and speed.

## Predicted Results
![predictions](https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/MDRT7LUt1RQE60CGW8to4.jpeg)

### Video Inferencing
<video muted autoplay loop controls src="https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/5Kt3KghZaanzOVaVB6JS9.mp4" width=800></video>


## Model Overview

**RF-DETR Threat Detection** is a specialized computer vision model designed for security and surveillance applications. Built on Roboflow's cutting-edge RF-DETR architecture, this model can accurately detect and classify potential threats in real-time scenarios.

The threat categories are as:

| Class ID | Threat Type | Description |
|----------|-------------|-------------|
| 1 | **Gun** | Any type of firearm weapon including pistols, rifles, and other firearms |
| 2 | **Explosive** | Fire, explosion scenarios, and explosive devices |
| 3 | **Grenade** | Hand grenades and similar explosive devices |
| 4 | **Knife** | Bladed weapons including knives, daggers, and sharp objects |

## Training Dataset 

Our custom threat detection dataset was meticulously curated and annotated to ensure robust model performance across diverse scenarios.

### Class Distribution
![class_distribution](https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/5t7k-SJfuZWXJTek_RPWh.png)

### Sample Annotations (Actual)
![sample_images_annotated](https://cdn-uploads.huggingface.co/production/uploads/66c6048d0bf40704e4159a23/Mf65kxTEwfq9HPMlzwO5y.png)

The model is trained to detect threats across various scales, from small concealed weapons to larger explosive devices.

## Performance Metrics

### Training Performance
![Training Metrics](metrics_plot.png)

The training process demonstrates excellent convergence with:
- **Consistent loss reduction** over 50 epochs
- **Stable validation performance** indicating good generalization
- **Balanced precision and recall** across all threat categories

### Validation Results

| Metric | Gun | Explosive | Grenade | Knife | **Overall** |
|--------|-----|-----------|---------|-------|-------------|
| **mAP@50:95** | 62.3% | 47.2% | 80.5% | 54.4% | **61.1%** |
| **mAP@50** | 90.1% | 69.6% | 93.7% | 85.8% | **84.8%** |
| **Precision** | 92.4% | 54.6% | 97.2% | 91.1% | **83.8%** |
| **Recall** | 85.0% | 85.0% | 85.0% | 85.0% | **85.0%** |

### Test Results

| Metric | Gun | Explosive | Grenade | Knife | **Overall** |
|--------|-----|-----------|---------|-------|-------------|
| **mAP@50:95** | 65.3% | 35.7% | 83.2% | 49.8% | **58.5%** |
| **mAP@50** | 93.1% | 60.5% | 91.1% | 79.7% | **81.1%** |
| **Precision** | 96.7% | 49.7% | 93.1% | 86.5% | **81.5%** |
| **Recall** | 83.0% | 83.0% | 83.0% | 83.0% | **83.0%** |

### Key Performance Highlights

- **84.8% mAP@50** on validation set
- **Fast inference** with RF-DETR Nano architecture
- **Excellent precision** for Gun (96.7%) and Grenade (93.1%) detection
- **Consistent recall** of 83-85% across all threat categories
- **Robust generalization** from validation to test performance

## Model Architecture

- **Base Architecture**: RF-DETR Nano
- **Input Resolution**: 640×640 pixels
- **Backbone**: Optimized transformer encoder
- **Detection Head**: Custom 4-class threat detection
- **Inference Speed**: ~50ms per image (GPU)
- **Model Size**: Lightweight for edge deployment

## Training Details

### Training Configuration
- **Epochs**: 50
- **Batch Size**: Optimized for available GPU memory
- **Optimizer**: AdamW with learning rate scheduling
- **Data Augmentation**: Advanced augmentation pipeline for robust training
- **Loss Function**: Multi-scale detection loss with class balancing

### Training Strategy
1. **Progressive Training**: Started with lower resolution, gradually increased
2. **Class Balancing**: Weighted loss to handle class imbalance
3. **Data Augmentation**: Extensive augmentation to improve generalization
4. **Early Stopping**: Monitored validation mAP to prevent overfitting

## Model Files

- `checkpoint_best_total.pth` - Main model weights 

### Inference Instructions

```python
pip install -q rfdetr==1.2.1 supervision==0.26.1 
```
- You can use: [video_processing.py](https://huggingface.co/Subh775/Threat-Detection-RF-DETR/blob/main/video_processing.py) to process large videos

- Below is the script to process a single image

```python
import numpy as np
import supervision as sv
import torch
import requests
from PIL import Image
import os

from rfdetr import RFDETRNano

THREAT_CLASSES = {
    1: "Gun",
    2: "Explosive", 
    3: "Grenade",
    4: "Knife"
}

image = Image.open("Path_to_image")

# pre-trained weights
weights_url = "https://huggingface.co/Subh775/Threat-Detection-RF-DETR/resolve/main/checkpoint_best_total.pth"
weights_filename = "checkpoint_best_total.pth"

# Download weights if not already present
if not os.path.exists(weights_filename):
    print(f"Downloading weights from {weights_url}")
    response = requests.get(weights_url, stream=True)
    response.raise_for_status()
    with open(weights_filename, 'wb') as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    print("Download complete.")

model = RFDETRNano(resolution=640, pretrain_weights=weights_filename)
model.optimize_for_inference()

detections = model.predict(image, threshold=0.5)

color = sv.ColorPalette.from_hex([
    "#1E90FF", "#32CD32", "#FF0000", "#FF8C00"
])

text_scale = sv.calculate_optimal_text_scale(resolution_wh=image.size)
thickness = sv.calculate_optimal_line_thickness(resolution_wh=image.size)

bbox_annotator = sv.BoxAnnotator(color=color, thickness=thickness)
label_annotator = sv.LabelAnnotator(
    color=color,
    text_color=sv.Color.BLACK,
    text_scale=text_scale,
    smart_position=True
)

labels = []
for class_id, confidence in zip(detections.class_id, detections.confidence):
    class_name = THREAT_CLASSES.get(class_id, f"unknown_class_{class_id}")
    labels.append(f"{class_name} {confidence:.2f}")

annotated_image = image.copy()
annotated_image = bbox_annotator.annotate(annotated_image, detections)
annotated_image = label_annotator.annotate(annotated_image, detections, labels)
annotated_image.thumbnail((800, 800))
annotated_image
```

## Acknowledgments

- **Roboflow** for the RF-DETR architecture
- **Hugging Face** for model hosting and distribution
- **PyTorch** ecosystem for deep learning framework
- **Supervision** library for computer vision utilities

**Disclaimer**: This model is designed for research purpose only. It's predictions cannot be taken into account for deployment right now.