DGA-Logit: TF-IDF + Logistic Regression for DGA Detection

TF-IDF character n-grams combined with 15 lexical features and Logistic Regression, trained on 54 DGA families. Part of the DGA Multi-Family Benchmark (Reynier et al., 2026).

Model Description

Architecture: TF-IDF (char n-grams) + 15 lexical features → StandardScaler → Logistic Regression
Features: Length, entropy, digit/vowel ratios, consecutive runs, SLD length, etc.
Framework: scikit-learn
File size: ~8 MB

Performance (54 DGA families, 30 runs each)

Metric	Value
Accuracy	0.9277
F1	0.9028
Precision	0.9407
Recall	0.8921
FPR	0.0367
Query Time	0.291 ms/domain (CPU)

Usage

from huggingface_hub import hf_hub_download
import importlib.util

artifacts_path = hf_hub_download("Reynier/dga-logit", "artifacts.joblib")
model_py = hf_hub_download("Reynier/dga-logit", "model.py")

spec = importlib.util.spec_from_file_location("logit_model", model_py)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)

artifacts = mod.load_model(artifacts_path)
results = mod.predict(artifacts, ["google.com", "xkr3f9mq.ru"])
print(results)

Citation

@article{reynier2026dga,
  title={DGA Multi-Family Benchmark: Comparing Classical and Transformer-based Detectors},
  author={Reynier et al.},
  year={2026}
}

Downloads last month: -

Collection including Reynier/dga-logit

DGA Multi-Family Benchmark

Collection

8 DGA detection models (CNN, BiLSTM, Bilbo, LABin, Logit, FANCI, DomURLsBERT, ModernBERT) trained on 54 malware families. • 8 items • Updated Mar 25