Reynier commited on
Commit
a24a7ff
·
verified ·
1 Parent(s): 69c888f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - dga
5
+ - cybersecurity
6
+ - domain-generation-algorithm
7
+ - text-classification
8
+ - pytorch
9
+ license: mit
10
+ metrics:
11
+ - accuracy
12
+ - f1
13
+ ---
14
+
15
+ # DGA-CNN: Character-level CNN for DGA Detection
16
+
17
+ Character-level Convolutional Neural Network trained to detect Domain Generation Algorithm (DGA) domains.
18
+ Part of the **DGA Multi-Family Benchmark** (Reynier et al., 2026).
19
+
20
+ ## Model Description
21
+
22
+ - **Architecture:** Single Conv1d layer (64 filters, kernel=3) + MaxPool + FC
23
+ - **Input:** Character-level encoding of domain name (max 75 chars)
24
+ - **Output:** Binary classification — `legit` (0) or `dga` (1)
25
+ - **Framework:** PyTorch
26
+
27
+ ## Performance (54 DGA families, 30 runs each)
28
+
29
+ | Metric | Value |
30
+ |-----------|--------|
31
+ | Accuracy | 0.9200 |
32
+ | F1 | 0.9000 |
33
+ | Precision | 0.9400 |
34
+ | Recall | 0.8900 |
35
+ | FPR | 0.0400 |
36
+ | Query Time| 0.490 ms/domain (CPU) |
37
+
38
+ ## Usage
39
+
40
+ ```python
41
+ from huggingface_hub import hf_hub_download
42
+ import importlib.util, torch
43
+
44
+ # Download model files
45
+ weights = hf_hub_download("Reynier/dga-cnn", "dga_cnn_model_1M.pth")
46
+ model_py = hf_hub_download("Reynier/dga-cnn", "model.py")
47
+
48
+ # Load module
49
+ spec = importlib.util.spec_from_file_location("cnn_model", model_py)
50
+ mod = importlib.util.module_from_spec(spec)
51
+ spec.loader.exec_module(mod)
52
+
53
+ # Load model
54
+ model = mod.load_model(weights)
55
+
56
+ # Predict
57
+ results = mod.predict(model, ["google.com", "xkr3f9mq.ru"])
58
+ print(results)
59
+ # [{"domain": "google.com", "label": "legit", "score": 0.02},
60
+ # {"domain": "xkr3f9mq.ru", "label": "dga", "score": 0.98}]
61
+ ```
62
+
63
+ ## Training Data
64
+
65
+ Trained on `train_1M.csv` — ~845K samples across 54 DGA families + legitimate domains.
66
+
67
+ ## Citation
68
+
69
+ ```bibtex
70
+ @article{reynier2026dga,
71
+ title={DGA Multi-Family Benchmark: Comparing Classical and Transformer-based Detectors},
72
+ author={Reynier et al.},
73
+ year={2026}
74
+ }
75
+ ```