spot-distilbert-phishing

A DistilBERT-based binary email classifier that labels emails as phishing or legitimate. This is the model shipped inside SPOT's analyzer-nlp plugin; it is mirrored here as a public artefact so researchers, integrators, and auditors can inspect what SPOT actually runs in production.

Model description

  • Architecture: DistilBertForSequenceClassification (6 layers, 768-dim hidden state, 12 attention heads). Sequence-classification head with two output labels.
  • Base checkpoint: cybersectony/phishing-email-detection-distilbert_v2.1, re-headed from 4 labels to 2 (ignore_mismatched_sizes=True) and fine-tuned end-to-end.
  • Tokenizer: WordPiece, 30 522 tokens, max sequence length 512.
  • Parameters: ~67 million.
  • Output: a 2-logit softmax [P(legitimate), P(phishing)]. The predicted class is the argmax.

Intended use

  • Primary use case: phishing detection on inbound email bodies in the SPOT platform. The analyzer-orchestrator calls POST /internal/analyze on analyzer-nlp; the wrapper in analyzer-nlp loads this model and returns an AnalysisResult contributing to the workflow's aggregated phishing verdict.
  • Suitable inputs: English email bodies, ideally up to 512 tokens. Subjects and bodies are concatenated and tokenised together.
  • Out of scope: SPOT does not use this model in isolation. The workflow combines it with rule-based, contextual, and (optionally) LLM-based analyzers; the final phishing verdict is the orchestrator's aggregate, not this model's argmax. Using the raw classifier on its own will produce more false positives and false negatives than the end-to-end SPOT pipeline.

Training data

Fine-tuned on the public SetFit/enron_spam dataset:

Split Examples
Train 31 716
Test (used as eval split) 2 000

Labels: 0 = legitimate, 1 = phishing/spam. The dataset is the Enron corpus relabelled by SetFit; it skews toward early-2000s English business email and reflects historical phishing patterns. Performance on contemporary phishing (modern brand impersonation, AI-generated text, image-only emails) is not guaranteed by this corpus, which is exactly why SPOT augments the verdict with rule-based and contextual analyzers.

Training procedure

Hyper-parameter Value
epochs 3
per_device_train_batch_size 16
per_device_eval_batch_size 64
warmup_steps 500
weight_decay 0.01
optimizer AdamW (HF Trainer defaults)
max sequence length 512 (truncated, padded to max)

Hardware: 2x NVIDIA Tesla T4 GPUs, run as a Kaggle notebook. Framework: Hugging Face transformers.Trainer.

Evaluation

Final metrics on the held-out SetFit/enron_spam test split (2 000 examples) at the end of epoch 3:

Metric Value
eval_loss 0.0277
eval_runtime (s) 16.80
eval_samples_per_second 119.06

Accuracy / precision / recall / F1 were not computed during training. The production wrapper in analyzer-nlp/tests/ exercises the end-to-end SPOT integration on a larger, internal evaluation set; those numbers are not yet published here.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("spotproject/spot-distilbert-phishing")
model = AutoModelForSequenceClassification.from_pretrained("spotproject/spot-distilbert-phishing")

text = "Dear customer, your account has been suspended. Click here to verify..."
inputs = tok(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)[0]
print({"legitimate": probs[0].item(), "phishing": probs[1].item()})

For the full SPOT integration ; with workflow context, knowledge-store enrichment, and aggregation across analyzers ; see analyzer-nlp on Codeberg.

Limitations and biases

  • Domain skew. Trained on Enron-era English business email. Expect degraded performance on consumer/mobile messaging, on non-English text, on modern AI-generated phishing, and on email-as-a-document attack vectors (HTML smuggling, image-only emails, QR-code phishing).
  • Fuzzy class boundary. The dataset's positive class collapses spam, marketing, and phishing into a single label. Treating the model's "phishing" probability as a calibrated phishing-only score will be optimistic.
  • No adversarial robustness guarantees. Simple rephrasing or Unicode obfuscation can flip the verdict. SPOT mitigates this by combining the model with non-NLP analyzers in the workflow orchestrator.
  • Personal data. Do not feed PII into a hosted version of this model without consent. The SPOT deployment runs the analyzer inside the customer's perimeter for that reason.

License

Apache-2.0, matching the analyzer-nlp source repository.

Citation

@software{spot_distilbert_phishing,
  title  = {spot-distilbert-phishing: phishing classifier for the SPOT platform},
  author = {SPOT Project},
  year   = {2026},
  url    = {https://huggingface.co/spotproject/spot-distilbert-phishing},
  license = {Apache-2.0}
}

Related

Downloads last month
42
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for spotproject/spot-distilbert-phishing

Dataset used to train spotproject/spot-distilbert-phishing