DRIFT — Fine-tuned DGA Detector

Drift-Resilient Invariant-Feature Transformer for DGA Detection

Authors: Chaeyoung Lee*, Chaeri Jung*, Seonghoon Jeong (* Equal contribution)
Affiliation: Division of Artificial Intelligence Engineering, Sookmyung Women's University
Venue: IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2026) — accepted, IEEE Xplore link coming soon
arXiv: arxiv.org/abs/2605.10436
GitHub: snsec-net/2026-DSN-DRIFT
Hugging Face: Paper · Dataset · Model

Model

finetuning.pt is the fine-tuned DRIFT checkpoint for binary DGA detection — given a domain name, it predicts benign (0) vs DGA-generated (1). DRIFT is designed for temporal robustness: it learns invariant structural features of domain names so that detection accuracy degrades far more slowly as new DGA variants emerge over time (concept drift).

Architecture — two backbones

DRIFT uses a dual-branch Transformer with a hybrid tokenization strategy. The two backbones process the same domain (effective second-level domain) in parallel:

Backbone	Captures	Tokenizer	Seq. len	Vocab
Character	stochastic morphological / lexical patterns	char-level (a–z, 0–9, `-`, `.`)	`L = 77`	`43`
Subword	word-based DGAs / morpheme semantics	WordPiece	`L = 30`	`30,522`

Both backbones share the same encoder configuration: embedding dim D = 256, 12 encoder layers, 8 attention heads, feed-forward dim 768, learnable positional embeddings.

Fine-tuning head: each branch produces [MaxPool ; MeanPool] over its last-layer hidden states; the two branch vectors are concatenated into a fused vector $v_{\text{fusion}} \in \mathbb{R}^{1024}$ (i.e. $4 D$ ), fed to a two-layer MLP (hidden size $2 D$ + ReLU + dropout → 2 logits). Trained with binary cross-entropy (Adam), using a two-stage transfer-learning schedule: first freeze the encoders and train only the head, then unfreeze and fine-tune end-to-end with a smaller backbone learning rate (1e-6) than the head (1e-4).

Semi-supervised pre-training (three subtasks)

Before supervised fine-tuning, each backbone is pre-trained with a multi-task self-supervised objective (total loss = sum of the three):

MTP — Masked Token Prediction: recover 15% masked characters/subwords from context (local patterns).
TPP — Token Position Prediction: recover the canonical ordering of shuffled tokens (global structure).
TOV — Token Order Verification: determine whether the given sequence is scrambed or not.

Dataset

Trained and evaluated on the DRIFT Longitudinal Benign and DGA Domain Name Dataset (2017–2025) — snsec-net/dga-detection-drift26dsn. It provides a nine-year, temporally aligned collection of ~49.4M benign domains (Alexa + Tranco) and ~149.4M DGA domains (DGArchive, 148 families) for evaluating detectors under real-world concept drift.

This checkpoint follows the paper's forward-chaining protocol: pre-trained and fine-tuned on 2017–2019 (300k held out per year for validation), then evaluated on strictly newer years 2020–2025, each tested independently to measure temporal degradation. The training split contains 65 DGA families; 83 families appear only in the test years.

Usage

import torch

# Loads the fine-tuned DRIFT checkpoint (state dict / packaged checkpoint).
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
ckpt = torch.load("finetuning.pt", map_location=device)
# Rebuild the dual-branch DRIFT model from the reference implementation
# (see the GitHub repo), then load these weights and run inference.
# We recommend preprocessing domain names exactly as specified in our paper and GitHub repository.

Inputs must be preprocessed exactly as in training: lowercased, TLD/ccTLD stripped to the effective second-level domain, characters restricted to alphanumerics, -, and . (per RFC 1035). See the dataset card and code for the full preprocessing pipeline and model definition.

Citation

Paper:

@inproceedings{lee2026drift,
  title     = {{DRIFT}: Drift-Resilient Invariant-Feature Transformer for {DGA} Detection},
  author    = {Lee, Chaeyoung and Jung, Chaeri and Jeong, Seonghoon},
  booktitle = {Proc. IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)},
  year      = {2026}
}

Dataset:

@misc{lee2026driftdata,
  author       = {Lee, Chaeyoung and Jung, Chaeri and Jeong, Seonghoon},
  title        = {Longitudinal Benign and {DGA} Domain Name Dataset},
  howpublished = {IEEE Dataport},
  year         = {2026},
  doi          = {10.21227/za2s-9e09},
  url          = {https://dx.doi.org/10.21227/za2s-9e09}
}

Intended Use & License

For cybersecurity research only — building and benchmarking DGA detectors and studying concept drift. It must not be used to operate, register, or distribute malicious domains. Released under CC BY-NC-SA 3.0 (non-commercial research / personal use): https://creativecommons.org/licenses/by-nc-sa/3.0/

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train snsec-net/dga-detector-drift26dsn

Paper for snsec-net/dga-detector-drift26dsn

DRIFT: Drift-Resilient Invariant-Feature Transformer for DGA Detection

Paper • 2605.10436 • Published May 11