spot-distilbert-phishing

A DistilBERT-based binary email classifier that labels emails as phishing or legitimate. This is the model shipped inside SPOT's analyzer-nlp plugin; it is mirrored here as a public artefact so researchers, integrators, and auditors can inspect what SPOT actually runs in production.

Model description

Architecture: DistilBertForSequenceClassification (6 layers, 768-dim hidden state, 12 attention heads). Sequence-classification head with two output labels.
Base checkpoint: cybersectony/phishing-email-detection-distilbert_v2.1, re-headed from 4 labels to 2 (ignore_mismatched_sizes=True) and fine-tuned end-to-end.
Tokenizer: WordPiece, 30 522 tokens, max sequence length 512.
Parameters: ~67 million.
Output: a 2-logit softmax [P(legitimate), P(phishing)]. The predicted class is the argmax.

Intended use

Primary use case: phishing detection on inbound email bodies in the SPOT platform. The analyzer-orchestrator calls POST /internal/analyze on analyzer-nlp; the wrapper in analyzer-nlp loads this model and returns an AnalysisResult contributing to the workflow's aggregated phishing verdict.
Suitable inputs: English email bodies, ideally up to 512 tokens. Subjects and bodies are concatenated and tokenised together.
Out of scope: SPOT does not use this model in isolation. The workflow combines it with rule-based, contextual, and (optionally) LLM-based analyzers; the final phishing verdict is the orchestrator's aggregate, not this model's argmax. Using the raw classifier on its own will produce more false positives and false negatives than the end-to-end SPOT pipeline.

Training data

Fine-tuned on the public SetFit/enron_spam dataset:

Split	Examples
Train	31 716
Test (used as eval split)	2 000

Labels: 0 = legitimate, 1 = phishing/spam. The dataset is the Enron corpus relabelled by SetFit; it skews toward early-2000s English business email and reflects historical phishing patterns. Performance on contemporary phishing (modern brand impersonation, AI-generated text, image-only emails) is not guaranteed by this corpus, which is exactly why SPOT augments the verdict with rule-based and contextual analyzers.

Training procedure

Hyper-parameter	Value
epochs	3
`per_device_train_batch_size`	16
`per_device_eval_batch_size`	64
`warmup_steps`	500
`weight_decay`	0.01
optimizer	AdamW (HF `Trainer` defaults)
max sequence length	512 (truncated, padded to max)

Hardware: 2x NVIDIA Tesla T4 GPUs, run as a Kaggle notebook. Framework: Hugging Face transformers.Trainer.

Evaluation

Final metrics on the held-out SetFit/enron_spam test split (2 000 examples) at the end of epoch 3:

Metric	Value
`eval_loss`	0.0277
`eval_runtime` (s)	16.80
`eval_samples_per_second`	119.06

Accuracy / precision / recall / F1 were not computed during training. The production wrapper in analyzer-nlp/tests/ exercises the end-to-end SPOT integration on a larger, internal evaluation set; those numbers are not yet published here.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("spotproject/spot-distilbert-phishing")
model = AutoModelForSequenceClassification.from_pretrained("spotproject/spot-distilbert-phishing")

text = "Dear customer, your account has been suspended. Click here to verify..."
inputs = tok(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)[0]
print({"legitimate": probs[0].item(), "phishing": probs[1].item()})

For the full SPOT integration ; with workflow context, knowledge-store enrichment, and aggregation across analyzers ; see analyzer-nlp on Codeberg.

Limitations and biases

Domain skew. Trained on Enron-era English business email. Expect degraded performance on consumer/mobile messaging, on non-English text, on modern AI-generated phishing, and on email-as-a-document attack vectors (HTML smuggling, image-only emails, QR-code phishing).
Fuzzy class boundary. The dataset's positive class collapses spam, marketing, and phishing into a single label. Treating the model's "phishing" probability as a calibrated phishing-only score will be optimistic.
No adversarial robustness guarantees. Simple rephrasing or Unicode obfuscation can flip the verdict. SPOT mitigates this by combining the model with non-NLP analyzers in the workflow orchestrator.
Personal data. Do not feed PII into a hosted version of this model without consent. The SPOT deployment runs the analyzer inside the customer's perimeter for that reason.

License

Apache-2.0, matching the analyzer-nlp source repository.

Citation

@software{spot_distilbert_phishing,
  title  = {spot-distilbert-phishing: phishing classifier for the SPOT platform},
  author = {SPOT Project},
  year   = {2026},
  url    = {https://huggingface.co/spotproject/spot-distilbert-phishing},
  license = {Apache-2.0}
}

Analyzer wrapper (production code): https://codeberg.org/SPOT_Project/analyzer-nlp
SPOT platform: https://codeberg.org/SPOT_Project/core
Project home: https://codeberg.org/SPOT_Project

Downloads last month: 42

Safetensors

Model size

67M params

Tensor type

F32

Model tree for spotproject/spot-distilbert-phishing

Base model

distilbert/distilbert-base-uncased

Finetuned

cybersectony/phishing-email-detection-distilbert_v2.1