dga-cnn / README.md
Reynier's picture
Upload README.md with huggingface_hub
a24a7ff verified
metadata
language: en
tags:
  - dga
  - cybersecurity
  - domain-generation-algorithm
  - text-classification
  - pytorch
license: mit
metrics:
  - accuracy
  - f1

DGA-CNN: Character-level CNN for DGA Detection

Character-level Convolutional Neural Network trained to detect Domain Generation Algorithm (DGA) domains. Part of the DGA Multi-Family Benchmark (Reynier et al., 2026).

Model Description

  • Architecture: Single Conv1d layer (64 filters, kernel=3) + MaxPool + FC
  • Input: Character-level encoding of domain name (max 75 chars)
  • Output: Binary classification — legit (0) or dga (1)
  • Framework: PyTorch

Performance (54 DGA families, 30 runs each)

Metric Value
Accuracy 0.9200
F1 0.9000
Precision 0.9400
Recall 0.8900
FPR 0.0400
Query Time 0.490 ms/domain (CPU)

Usage

from huggingface_hub import hf_hub_download
import importlib.util, torch

# Download model files
weights = hf_hub_download("Reynier/dga-cnn", "dga_cnn_model_1M.pth")
model_py = hf_hub_download("Reynier/dga-cnn", "model.py")

# Load module
spec = importlib.util.spec_from_file_location("cnn_model", model_py)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)

# Load model
model = mod.load_model(weights)

# Predict
results = mod.predict(model, ["google.com", "xkr3f9mq.ru"])
print(results)
# [{"domain": "google.com", "label": "legit", "score": 0.02},
#  {"domain": "xkr3f9mq.ru", "label": "dga", "score": 0.98}]

Training Data

Trained on train_1M.csv — ~845K samples across 54 DGA families + legitimate domains.

Citation

@article{reynier2026dga,
  title={DGA Multi-Family Benchmark: Comparing Classical and Transformer-based Detectors},
  author={Reynier et al.},
  year={2026}
}