spot-distilbert-phishing
A DistilBERT-based binary email classifier that labels emails as
phishing or legitimate. This is the model shipped inside
SPOT's analyzer-nlp plugin; it
is mirrored here as a public artefact so researchers, integrators, and
auditors can inspect what SPOT actually runs in production.
Model description
- Architecture:
DistilBertForSequenceClassification(6 layers, 768-dim hidden state, 12 attention heads). Sequence-classification head with two output labels. - Base checkpoint:
cybersectony/phishing-email-detection-distilbert_v2.1, re-headed from 4 labels to 2 (ignore_mismatched_sizes=True) and fine-tuned end-to-end. - Tokenizer: WordPiece, 30 522 tokens, max sequence length 512.
- Parameters: ~67 million.
- Output: a 2-logit softmax
[P(legitimate), P(phishing)]. The predicted class is the argmax.
Intended use
- Primary use case: phishing detection on inbound email bodies in
the SPOT platform. The analyzer-orchestrator calls
POST /internal/analyzeonanalyzer-nlp; the wrapper inanalyzer-nlploads this model and returns anAnalysisResultcontributing to the workflow's aggregated phishing verdict. - Suitable inputs: English email bodies, ideally up to 512 tokens. Subjects and bodies are concatenated and tokenised together.
- Out of scope: SPOT does not use this model in isolation. The workflow combines it with rule-based, contextual, and (optionally) LLM-based analyzers; the final phishing verdict is the orchestrator's aggregate, not this model's argmax. Using the raw classifier on its own will produce more false positives and false negatives than the end-to-end SPOT pipeline.
Training data
Fine-tuned on the public SetFit/enron_spam
dataset:
| Split | Examples |
|---|---|
| Train | 31 716 |
| Test (used as eval split) | 2 000 |
Labels: 0 = legitimate, 1 = phishing/spam. The dataset is the
Enron corpus relabelled by SetFit; it skews toward early-2000s English
business email and reflects historical phishing patterns. Performance
on contemporary phishing (modern brand impersonation, AI-generated
text, image-only emails) is not guaranteed by this corpus, which is
exactly why SPOT augments the verdict with rule-based and contextual
analyzers.
Training procedure
| Hyper-parameter | Value |
|---|---|
| epochs | 3 |
per_device_train_batch_size |
16 |
per_device_eval_batch_size |
64 |
warmup_steps |
500 |
weight_decay |
0.01 |
| optimizer | AdamW (HF Trainer defaults) |
| max sequence length | 512 (truncated, padded to max) |
Hardware: 2x NVIDIA Tesla T4 GPUs, run as a Kaggle notebook.
Framework: Hugging Face transformers.Trainer.
Evaluation
Final metrics on the held-out SetFit/enron_spam test split (2 000
examples) at the end of epoch 3:
| Metric | Value |
|---|---|
eval_loss |
0.0277 |
eval_runtime (s) |
16.80 |
eval_samples_per_second |
119.06 |
Accuracy / precision / recall / F1 were not computed during training.
The production wrapper in analyzer-nlp/tests/ exercises the
end-to-end SPOT integration on a larger, internal evaluation set;
those numbers are not yet published here.
How to use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tok = AutoTokenizer.from_pretrained("spotproject/spot-distilbert-phishing")
model = AutoModelForSequenceClassification.from_pretrained("spotproject/spot-distilbert-phishing")
text = "Dear customer, your account has been suspended. Click here to verify..."
inputs = tok(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)[0]
print({"legitimate": probs[0].item(), "phishing": probs[1].item()})
For the full SPOT integration ; with workflow context, knowledge-store
enrichment, and aggregation across analyzers ; see
analyzer-nlp on Codeberg.
Limitations and biases
- Domain skew. Trained on Enron-era English business email. Expect degraded performance on consumer/mobile messaging, on non-English text, on modern AI-generated phishing, and on email-as-a-document attack vectors (HTML smuggling, image-only emails, QR-code phishing).
- Fuzzy class boundary. The dataset's positive class collapses spam, marketing, and phishing into a single label. Treating the model's "phishing" probability as a calibrated phishing-only score will be optimistic.
- No adversarial robustness guarantees. Simple rephrasing or Unicode obfuscation can flip the verdict. SPOT mitigates this by combining the model with non-NLP analyzers in the workflow orchestrator.
- Personal data. Do not feed PII into a hosted version of this model without consent. The SPOT deployment runs the analyzer inside the customer's perimeter for that reason.
License
Apache-2.0, matching the analyzer-nlp
source repository.
Citation
@software{spot_distilbert_phishing,
title = {spot-distilbert-phishing: phishing classifier for the SPOT platform},
author = {SPOT Project},
year = {2026},
url = {https://huggingface.co/spotproject/spot-distilbert-phishing},
license = {Apache-2.0}
}
Related
- Analyzer wrapper (production code): https://codeberg.org/SPOT_Project/analyzer-nlp
- SPOT platform: https://codeberg.org/SPOT_Project/core
- Project home: https://codeberg.org/SPOT_Project
- Downloads last month
- 42
Model tree for spotproject/spot-distilbert-phishing
Base model
distilbert/distilbert-base-uncased