| --- |
| language: en |
| tags: |
| - dga |
| - cybersecurity |
| - domain-generation-algorithm |
| - text-classification |
| - sklearn |
| license: mit |
| --- |
| |
| # DGA-Logit: TF-IDF + Logistic Regression for DGA Detection |
|
|
| TF-IDF character n-grams combined with 15 lexical features and Logistic Regression, trained on 54 DGA families. |
| Part of the **DGA Multi-Family Benchmark** (Reynier et al., 2026). |
|
|
| ## Model Description |
|
|
| - **Architecture:** TF-IDF (char n-grams) + 15 lexical features → StandardScaler → Logistic Regression |
| - **Features:** Length, entropy, digit/vowel ratios, consecutive runs, SLD length, etc. |
| - **Framework:** scikit-learn |
| - **File size:** ~8 MB |
|
|
| ## Performance (54 DGA families, 30 runs each) |
|
|
| | Metric | Value | |
| |-----------|--------| |
| | Accuracy | 0.9277 | |
| | F1 | 0.9028 | |
| | Precision | 0.9407 | |
| | Recall | 0.8921 | |
| | FPR | 0.0367 | |
| | Query Time| 0.291 ms/domain (CPU) | |
|
|
| ## Usage |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| import importlib.util |
| |
| artifacts_path = hf_hub_download("Reynier/dga-logit", "artifacts.joblib") |
| model_py = hf_hub_download("Reynier/dga-logit", "model.py") |
| |
| spec = importlib.util.spec_from_file_location("logit_model", model_py) |
| mod = importlib.util.module_from_spec(spec) |
| spec.loader.exec_module(mod) |
| |
| artifacts = mod.load_model(artifacts_path) |
| results = mod.predict(artifacts, ["google.com", "xkr3f9mq.ru"]) |
| print(results) |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{reynier2026dga, |
| title={DGA Multi-Family Benchmark: Comparing Classical and Transformer-based Detectors}, |
| author={Reynier et al.}, |
| year={2026} |
| } |
| ``` |
|
|