DRIFT β Fine-tuned DGA Detector
Drift-Resilient Invariant-Feature Transformer for DGA Detection
- Authors: Chaeyoung Lee*, Chaeri Jung*, Seonghoon Jeong (* Equal contribution)
- Affiliation: Division of Artificial Intelligence Engineering, Sookmyung Women's University
- Venue: IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2026) β accepted, IEEE Xplore link coming soon
- arXiv: arxiv.org/abs/2605.10436
- GitHub: snsec-net/2026-DSN-DRIFT
- Hugging Face: Paper Β· Dataset Β· Model
Model
finetuning.pt is the fine-tuned DRIFT checkpoint for binary DGA detection β given a domain name, it predicts benign (0) vs DGA-generated (1). DRIFT is designed for temporal robustness: it learns invariant structural features of domain names so that detection accuracy degrades far more slowly as new DGA variants emerge over time (concept drift).
Architecture β two backbones
DRIFT uses a dual-branch Transformer with a hybrid tokenization strategy. The two backbones process the same domain (effective second-level domain) in parallel:
| Backbone | Captures | Tokenizer | Seq. len | Vocab |
|---|---|---|---|---|
| Character | stochastic morphological / lexical patterns | char-level (aβz, 0β9, -, .) |
L = 77 |
43 |
| Subword | word-based DGAs / morpheme semantics | WordPiece | L = 30 |
30,522 |
Both backbones share the same encoder configuration: embedding dim D = 256, 12 encoder layers, 8 attention heads, feed-forward dim 768, learnable positional embeddings.
Fine-tuning head: each branch produces [MaxPool ; MeanPool] over its last-layer hidden states; the two branch vectors are concatenated into a fused vector (i.e. ), fed to a two-layer MLP (hidden size + ReLU + dropout β 2 logits). Trained with binary cross-entropy (Adam), using a two-stage transfer-learning schedule: first freeze the encoders and train only the head, then unfreeze and fine-tune end-to-end with a smaller backbone learning rate (1e-6) than the head (1e-4).
Semi-supervised pre-training (three subtasks)
Before supervised fine-tuning, each backbone is pre-trained with a multi-task self-supervised objective (total loss = sum of the three):
- MTP β Masked Token Prediction: recover 15% masked characters/subwords from context (local patterns).
- TPP β Token Position Prediction: recover the canonical ordering of shuffled tokens (global structure).
- TOV β Token Order Verification: determine whether the given sequence is scrambed or not.
Dataset
Trained and evaluated on the DRIFT Longitudinal Benign and DGA Domain Name Dataset (2017β2025) β snsec-net/dga-detection-drift26dsn. It provides a nine-year, temporally aligned collection of ~49.4M benign domains (Alexa + Tranco) and ~149.4M DGA domains (DGArchive, 148 families) for evaluating detectors under real-world concept drift.
This checkpoint follows the paper's forward-chaining protocol: pre-trained and fine-tuned on 2017β2019 (300k held out per year for validation), then evaluated on strictly newer years 2020β2025, each tested independently to measure temporal degradation. The training split contains 65 DGA families; 83 families appear only in the test years.
Usage
import torch
# Loads the fine-tuned DRIFT checkpoint (state dict / packaged checkpoint).
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
ckpt = torch.load("finetuning.pt", map_location=device)
# Rebuild the dual-branch DRIFT model from the reference implementation
# (see the GitHub repo), then load these weights and run inference.
# We recommend preprocessing domain names exactly as specified in our paper and GitHub repository.
Inputs must be preprocessed exactly as in training: lowercased, TLD/ccTLD stripped to the effective second-level domain, characters restricted to alphanumerics,
-, and.(per RFC 1035). See the dataset card and code for the full preprocessing pipeline and model definition.
Citation
Paper:
@inproceedings{lee2026drift,
title = {{DRIFT}: Drift-Resilient Invariant-Feature Transformer for {DGA} Detection},
author = {Lee, Chaeyoung and Jung, Chaeri and Jeong, Seonghoon},
booktitle = {Proc. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)},
year = {2026}
}
Dataset:
@misc{lee2026driftdata,
author = {Lee, Chaeyoung and Jung, Chaeri and Jeong, Seonghoon},
title = {Longitudinal Benign and {DGA} Domain Name Dataset},
howpublished = {IEEE Dataport},
year = {2026},
doi = {10.21227/za2s-9e09},
url = {https://dx.doi.org/10.21227/za2s-9e09}
}
Intended Use & License
For cybersecurity research only β building and benchmarking DGA detectors and studying concept drift. It must not be used to operate, register, or distribute malicious domains. Released under CC BY-NC-SA 3.0 (non-commercial research / personal use): https://creativecommons.org/licenses/by-nc-sa/3.0/
