--- language: en tags: - dga - cybersecurity - domain-generation-algorithm - text-classification - sklearn license: mit --- # DGA-Logit: TF-IDF + Logistic Regression for DGA Detection TF-IDF character n-grams combined with 15 lexical features and Logistic Regression, trained on 54 DGA families. Part of the **DGA Multi-Family Benchmark** (Reynier et al., 2026). ## Model Description - **Architecture:** TF-IDF (char n-grams) + 15 lexical features → StandardScaler → Logistic Regression - **Features:** Length, entropy, digit/vowel ratios, consecutive runs, SLD length, etc. - **Framework:** scikit-learn - **File size:** ~8 MB ## Performance (54 DGA families, 30 runs each) | Metric | Value | |-----------|--------| | Accuracy | 0.9277 | | F1 | 0.9028 | | Precision | 0.9407 | | Recall | 0.8921 | | FPR | 0.0367 | | Query Time| 0.291 ms/domain (CPU) | ## Usage ```python from huggingface_hub import hf_hub_download import importlib.util artifacts_path = hf_hub_download("Reynier/dga-logit", "artifacts.joblib") model_py = hf_hub_download("Reynier/dga-logit", "model.py") spec = importlib.util.spec_from_file_location("logit_model", model_py) mod = importlib.util.module_from_spec(spec) spec.loader.exec_module(mod) artifacts = mod.load_model(artifacts_path) results = mod.predict(artifacts, ["google.com", "xkr3f9mq.ru"]) print(results) ``` ## Citation ```bibtex @article{reynier2026dga, title={DGA Multi-Family Benchmark: Comparing Classical and Transformer-based Detectors}, author={Reynier et al.}, year={2026} } ```