DEF-sariad -- SAR Anomaly Detection
Wave 8 Defense Module | ANIMA Framework | Apache 2.0
Two-phase SAR anomaly detection system with 8 custom CUDA kernels for real-time defense applications. Designed for NEMESIS (terrain navigation) and ATLAS (fleet autonomy) stacks.
Architecture
Phase 1: Pixel-Space Reconstruction
Input (256x256 RGB) -> CUDA Median Filter (101x speedup)
-> Convolutional AE (22.4M params) -> Reconstruction Error -> Anomaly Map
Phase 2: Feature-Space Detection (recommended for deployment)
Input (256x256 RGB) -> DINOv2 ViT-B/14 (frozen, 86.6M params)
-> 768-dim features -> KNN/Gaussian/MLP detector -> Anomaly Score
Models
Phase 1: Pixel-Space Autoencoders
| Model | Params | Latent | Val Loss | Throughput | Files |
|---|---|---|---|---|---|
| Combined AE (primary) | 22.4M | 256 | 2.600 | 877 img/s | model.*, combined/ |
| Deep AE | 39.2M | 512 | 2.600 | 902 img/s | combined/ variant |
| VAE | 55.9M | 512 | 2.603 | 878 img/s | combined/ variant |
| SAR-only AE | 22.4M | 256 | 2.808 | 330 img/s | best.pth |
Trained on 62.8K combined SAR ship + VIVID++ thermal images. All architectures plateau at val_loss ~ 2.600 (pixel reconstruction ceiling).
Phase 2: DINOv2 Feature-Space Detectors
| Detector | Method | CUDA Speedup | Test Anomaly Rate | Files |
|---|---|---|---|---|
| KNN (recommended) | k=5 NN in 10K bank | 81.4x | 4.9% | feature_detector/knn_detector.pth |
| Gaussian | Mahalanobis distance | 2.2x | 38.0% | feature_detector/gaussian_detector.pth |
| MLP head | 768->256->128->1 | -- | learned | feature_detector/mlp_head.* |
Features extracted with DINOv2 ViT-B/14 (frozen). 71,917 features across 24 VIVID++ scenes. KNN detector at 81x CUDA speedup is recommended for real-time deployment.
CUDA Kernels (8 ops, sm_89)
Custom CUDA kernels compiled for NVIDIA L4 (compute 8.9), CUDA 12, torch cu128:
| Kernel | Speedup | Use Case |
|---|---|---|
fused_median_filter_3x3 |
101x | SAR speckle denoising |
fused_median_filter_5x5 |
14x | Heavy speckle denoising |
sar_log_normalize |
4.5x | SAR amplitude preprocessing |
fused_reconstruct_error_map |
1.9x | Pixel anomaly scoring (4D) |
fused_sar_nlm_denoise |
-- | Non-local means denoising |
anomaly_score |
-- | Per-pixel reconstruction error (3D) |
fused_mahalanobis_distance |
2.2x | Gaussian feature-space scoring |
fused_knn_distance |
81.4x | KNN feature-space scoring |
All kernels have automatic PyTorch fallbacks when custom CUDA extensions aren't available.
Export Formats
Phase 1 (Combined AE)
| Format | File | Size |
|---|---|---|
| PyTorch | model.pth |
86MB |
| SafeTensors | model.safetensors |
86MB |
| ONNX | model.onnx |
86MB |
| TensorRT FP16 | model_fp16.engine |
44MB |
| TensorRT FP32 | model_fp32.engine |
86MB |
Phase 2 (MLP Head)
| Format | File | Size |
|---|---|---|
| PyTorch | feature_detector/mlp_head.pth |
0.9MB |
| SafeTensors | feature_detector/mlp_head.safetensors |
0.9MB |
| ONNX | feature_detector/mlp_head.onnx |
921KB |
| TensorRT FP16 | feature_detector/mlp_head_fp16.engine |
965KB |
| TensorRT FP32 | feature_detector/mlp_head_fp32.engine |
965KB |
Usage
Phase 1: Pixel-Space Anomaly Detection
import torch
from def_sariad.models import SARAutoencoder
from def_sariad.backends.cuda_ops import cuda_median_filter
# Load model
model = SARAutoencoder(in_channels=3, latent_dim=256).cuda()
state = torch.load("model.pth", map_location="cuda")
model.load_state_dict(state["model"] if "model" in state else state)
model.eval()
# Preprocess with CUDA median filter (101x speedup)
sar_image = torch.randn(1, 3, 256, 256).cuda()
denoised = cuda_median_filter(sar_image, kernel_size=3)
# Detect anomalies
with torch.no_grad():
anomaly_map = model.compute_anomaly_score(denoised)
# anomaly_map shape: [1, 256, 256] -- higher values = more anomalous
Phase 2: Feature-Space Anomaly Detection (Recommended)
import torch
# Load pre-extracted DINOv2 features (768-dim)
features = torch.load("vivid_dinov2_features/scene_features.pt")
# KNN detector (81x CUDA speedup)
knn_state = torch.load("feature_detector/knn_detector.pth")
feature_bank = knn_state["feature_bank"].cuda() # [10000, 768]
threshold = knn_state["threshold"] # 23.94
# Score new features
query = features[:100].cuda() # [100, 768]
dists = torch.cdist(query, feature_bank) # [100, 10000]
knn_dists, _ = dists.topk(5, largest=False, dim=1) # k=5
scores = knn_dists.mean(dim=1) # [100]
anomalies = scores > threshold
ONNX Inference
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("model.onnx", providers=["CUDAExecutionProvider"])
input_data = np.random.randn(1, 3, 256, 256).astype(np.float32)
outputs = session.run(None, {"input": input_data})
TensorRT Inference
# Use model_fp16.engine for fastest inference
# Requires tensorrt Python package
import tensorrt as trt
# Load engine and run inference via TRT runtime
Training Details
| Parameter | Phase 1 | Phase 2 |
|---|---|---|
| Dataset | 62.8K SAR+thermal | 71.9K DINOv2 features |
| Input | 256x256 RGB | 768-dim vectors |
| Optimizer | AdamW | -- (KNN/Gaussian are non-parametric) |
| Learning rate | 3e-4 | -- |
| Scheduler | Warmup (5%) + Cosine | -- |
| Epochs | 50-100 | 1 (feature extraction) |
| GPU | NVIDIA L4 (23GB) | NVIDIA L4 (23GB) |
| Precision | FP32 | FP32 |
File Structure
DEF-sariad/
+-- model.pth # Phase 1: Combined AE weights
+-- model.safetensors # Phase 1: SafeTensors format
+-- model.onnx # Phase 1: ONNX export
+-- model_fp16.engine # Phase 1: TensorRT FP16
+-- model_fp32.engine # Phase 1: TensorRT FP32
+-- best.pth # Phase 1: SAR-only AE weights
+-- combined/ # Phase 1: Combined AE full export set
+-- feature_detector/
| +-- gaussian_detector.pth # Phase 2: Mahalanobis detector
| +-- knn_detector.pth # Phase 2: KNN detector (recommended)
| +-- mlp_head.pth # Phase 2: Learned MLP head
| +-- mlp_head.safetensors # Phase 2: MLP SafeTensors
| +-- mlp_head.onnx # Phase 2: MLP ONNX
| +-- mlp_head_fp16.engine # Phase 2: MLP TRT FP16
| +-- mlp_head_fp32.engine # Phase 2: MLP TRT FP32
| +-- training_report.json # Phase 2: Detector metrics
+-- config/
| +-- anima_module.yaml # ANIMA module manifest
| +-- autoencoder.toml # AE training config
| +-- paper.toml # Paper reproduction config
+-- export_manifest.json # Export metadata
+-- training_report.json # Phase 1 training metrics
+-- TRAINING_REPORT.md # Full training report
+-- README.md # This model card
Citation
@article{sariad2025,
title={SARIAD: A Comprehensive Benchmark for SAR Image Anomaly Detection},
year={2025},
note={arXiv:2504.08115}
}
License
Apache 2.0. Part of the ANIMA Framework by Robot Flow Labs.
Paper for ilessio-aiflowlab/DEF-sariad
Evaluation results
- Val Loss (MSE)self-reported2.600
- Throughputself-reported877 img/s
- CUDA Speedupself-reported81.4x
- Test Anomaly Rateself-reported4.9%