dga-domurlsbert / README.md
Reynier's picture
Upload README.md with huggingface_hub
cea8f4f verified
metadata
language: en
tags:
  - dga
  - cybersecurity
  - domain-generation-algorithm
  - text-classification
  - bert
  - lora
  - peft
license: mit

DGA-DomURLsBERT: BERT + LoRA for DGA Detection

BERT (bert-base-uncased) fine-tuned with LoRA (r=8) for DGA detection, trained on 54 DGA families. Part of the DGA Multi-Family Benchmark (Reynier et al., 2026).

Model Description

  • Base model: bert-base-uncased
  • Fine-tuning: LoRA (r=8, alpha=16, target: query+value, dropout=0.1)
  • Task: Sequence Classification (binary: legit/dga)
  • Framework: HuggingFace Transformers + PEFT

Usage

# !pip install peft transformers

from transformers import BertTokenizer, BertForSequenceClassification
from peft import PeftModel
import torch

tokenizer = BertTokenizer.from_pretrained("Reynier/dga-domurlsbert")
base_model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
model = PeftModel.from_pretrained(base_model, "Reynier/dga-domurlsbert").eval()

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

id2label = {0: "legit", 1: "dga"}

def predict(domains):
    results = []
    for domain in domains:
        inputs = tokenizer(domain, return_tensors="pt", truncation=True).to(device)
        with torch.no_grad():
            logits = model(**inputs).logits
            pred = torch.argmax(logits, dim=1).item()
            score = torch.softmax(logits, dim=1)[0, 1].item()
        results.append({"domain": domain, "label": id2label[pred], "score": round(score, 4)})
    return results

print(predict(["google.com", "xkr3f9mq.ru"]))

Citation

@article{reynier2026dga,
  title={DGA Multi-Family Benchmark: Comparing Classical and Transformer-based Detectors},
  author={Reynier et al.},
  year={2026}
}