Phishing BERT Model

A fine-tuned BERT (bert-base-uncased) classifier for phishing email detection, developed as part of the NTUST MI5137701 course term project (Group 72).

Model Description

This model classifies email text into two categories:

phishing email (label 1)
safe email (label 0)

The model is trained on body-only email text (headers stripped) to avoid learning dataset artifacts rather than genuine phishing indicators.

Training Details

Parameter	Value
Base model	`bert-base-uncased` (110M params)
Architecture	`BertForSequenceClassification`
Labels	2 (phishing email, safe email)
Learning rate	2e-5
Optimizer	AdamW
Loss	Cross-entropy
Epochs	3
Max sequence length	512 tokens
Training dataset	drorrabin/phishing_emails-data (26,946 emails, 50/50 split)
Random seed	42

Evaluation Results (Cross-Corpus)

All results are evaluated on a disjoint test corpus (Phishing_Email.csv, 18,460 emails) that has zero overlap with the training data.

BERT Standalone

Metric	Value
Accuracy	80.71%
F1 (macro)	71.22%
Recall	60.43%
Precision	86.70%
ROC-AUC	0.919
ECE	0.178

Cascade (BERT + Qwen 2.5 7B CoT)

Metric	Value
Accuracy	81.07%
F1 (macro)	71.94%
Recall	61.45%
Precision	86.75%
Escalation rate	1.21%
Amortized latency	~0.36s/email

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("harrynguyen5/phishing-bert-model")
model = AutoModelForSequenceClassification.from_pretrained("harrynguyen5/phishing-bert-model")

email_text = "Dear user, your account has been compromised. Click here to verify."
inputs = tokenizer(email_text, return_tensors="pt", truncation=True, max_length=512, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    prediction = torch.argmax(probs, dim=-1).item()

label = model.config.id2label[prediction]
confidence = probs[0][prediction].item()
print(f"Prediction: {label} (confidence: {confidence:.4f})")

Important Notes

Body-only input: This model is trained on email body text only. Headers, subject lines, and metadata should be stripped before inference.
Cross-corpus evaluation: In-distribution accuracy (99.76%) is artificially inflated due to a dataset artifact (ceas-challenge header). The cross-corpus results above are the reliable benchmark.
Calibration: ECE = 0.178 indicates the model's confidence scores do not reflect true probabilities. Post-hoc calibration is recommended before using confidence thresholds.

Team

Member	Student ID	Role
Bui The Hien	M11409806	Dataset & Fine-Tuning
Le Trung Kien	M11415803	Evaluation & Benchmarking
Nguyen Quoc Nguyen	M11409814	Prompt Pipeline & Logic

Dataset used to train harrynguyen5/phishing-bert-model

Evaluation results

accuracy on Cross-corpus test set (Phishing_Email.csv)
self-reported

0.807
f1 on Cross-corpus test set (Phishing_Email.csv)
self-reported

0.712
roc_auc on Cross-corpus test set (Phishing_Email.csv)
self-reported

0.919

harrynguyen5
/

phishing-bert-model