DGA-Logit: TF-IDF + Logistic Regression for DGA Detection

TF-IDF character n-grams combined with 15 lexical features and Logistic Regression, trained on 54 DGA families. Part of the DGA Multi-Family Benchmark (Reynier et al., 2026).

Model Description

  • Architecture: TF-IDF (char n-grams) + 15 lexical features → StandardScaler → Logistic Regression
  • Features: Length, entropy, digit/vowel ratios, consecutive runs, SLD length, etc.
  • Framework: scikit-learn
  • File size: ~8 MB

Performance (54 DGA families, 30 runs each)

Metric Value
Accuracy 0.9277
F1 0.9028
Precision 0.9407
Recall 0.8921
FPR 0.0367
Query Time 0.291 ms/domain (CPU)

Usage

from huggingface_hub import hf_hub_download
import importlib.util

artifacts_path = hf_hub_download("Reynier/dga-logit", "artifacts.joblib")
model_py = hf_hub_download("Reynier/dga-logit", "model.py")

spec = importlib.util.spec_from_file_location("logit_model", model_py)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)

artifacts = mod.load_model(artifacts_path)
results = mod.predict(artifacts, ["google.com", "xkr3f9mq.ru"])
print(results)

Citation

@article{reynier2026dga,
  title={DGA Multi-Family Benchmark: Comparing Classical and Transformer-based Detectors},
  author={Reynier et al.},
  year={2026}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Reynier/dga-logit