RoBERTa Email Fraud Detector
A fine-tuned RoBERTa-base model for binary email fraud/phishing classification. Given an email (subject + body), the model predicts whether it is fraud (1) or normal (0) with high accuracy and very low false-positive rate.
基于 RoBERTa-base 微调的二分类邮件欺诈/钓鱼检测模型。输入邮件(主题+正文),模型预测该邮件是 欺诈 (1) 还是 **正常 (0)**,具有高准确率和极低的误报率。
Model Description / 模型描述
| Architecture | RobertaForSequenceClassification (roberta-base + linear classification head) |
| Parameters | ~125 million |
| Max Input Length | 512 tokens |
| Output | 2 classes: 0 = normal/ham, 1 = fraud/spam |
| Model Size | 475 MB |
Lineage / 模型血统
roberta-base (Meta AI — pre-trained on 160GB English text)
└─► mshenoda/roberta-spam (fine-tuned on Enron spam corpus)
└─► cunxin/roberta-email-fraud-detector (continued fine-tuning on expanded email dataset)
The model inherits general language understanding from roberta-base and spam pattern recognition from mshenoda/roberta-spam. Our continued fine-tuning adapts it to a broader email fraud distribution including phishing, scam, and social engineering patterns.
该模型继承了 roberta-base 的通用语言理解能力和 mshenoda/roberta-spam 的垃圾邮件模式识别能力。我们的继续微调使其适应了更广泛的邮件欺诈分布,包括钓鱼、诈骗和社会工程攻击模式。
Performance / 性能指标
Evaluated on a held-out test set of ~12,250 emails (never seen during training):
在约 12,250 封从未参与训练的测试邮件上评估:
| Metric / 指标 | Value / 值 |
|---|---|
| Overall Accuracy / 总体准确率 | 99.5% |
| Detection Rate (TPR) / 检出率 | 99.4% (5,841 / 5,876 fraud detected) |
| False Positive Rate (FPR) / 误报率 | 0.5% (31 / 6,374 normal misclassified) |
| False Negatives / 漏报数 | 35 |
| False Positives / 误报数 | 31 |
Confusion Matrix / 混淆矩阵
Predicted Fraud Predicted Normal
Actual Fraud 5,841 (TP) 35 (FN)
Actual Normal 31 (FP) 6,343 (TN)
Correction Pool Results / 矫正池结果
Evaluated on a separate held-out correction pool (completely independent from both training and test sets):
在独立的矫正池上评估(与训练集和测试集完全独立):
- 2,808 fraud emails → only 1 false negative / 2,808 封欺诈邮件 → 仅 1 封漏报
- 2,982 normal emails → 0 false positives / 2,982 封正常邮件 → 0 封误报
Usage / 使用方法
Quick Start / 快速开始
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "cunxin/roberta-email-fraud-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
# Format: "Subject: {subject}\n\n{content}"
email_text = "Subject: URGENT: Verify your account immediately\n\nDear Customer, your account has been compromised. Click here to verify: http://suspicious-link.com"
inputs = tokenizer(email_text, return_tensors="pt", max_length=512, truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=-1)
fraud_prob = probs[0, 1].item()
print(f"Fraud probability: {fraud_prob:.4f}")
print(f"Prediction: {'FRAUD' if fraud_prob >= 0.5 else 'NORMAL'}")
Input Format / 输入格式
The model expects emails formatted as:
模型期望的输入格式:
Subject: {email subject}\n\n{email body}
- Content is truncated to ~1,800 characters (fits within 512 tokens after tokenization) / 正文截断至约 1,800 字符(分词后不超过 512 token)
- Subject is placed first to ensure it is never truncated / 主题放在最前面以确保不被截断
Output / 输出
The model outputs 2 logits:
模型输出 2 个 logits:
- Index 0: normal/ham score / 正常邮件分数
- Index 1: fraud/spam score / 欺诈邮件分数
Apply softmax to get probabilities. probs[0, 1] >= 0.5 → fraud. / 经 softmax 转换为概率,probs[0, 1] >= 0.5 则判定为欺诈。
Training / 训练详情
Training Data / 训练数据
| Dataset / 数据集 | Count / 数量 | Source / 来源 |
|---|---|---|
| Fraud emails (train) | ~5,877 | Enron corpus via SetFit/enron_spam + SpamAssassin |
| Normal emails (train) | ~6,374 | Enron corpus via SetFit/enron_spam + Ling-Spam |
| Fraud emails (test) | ~5,876 | Same sources, held-out split |
| Normal emails (test) | ~6,374 | Same sources, held-out split |
Training Method / 训练方法
- Task: Binary sequence classification (fraud vs. normal) / 二分类序列分类(欺诈 vs 正常)
- Loss: Weighted cross-entropy with
normal_weight=2.0(penalizes false positives 2x more heavily) / 加权交叉熵,正常类权重为 2.0(误报惩罚为漏报的 2 倍) - Optimizer: AdamW (
lr=2e-5,weight_decay=0.01) / AdamW 优化器 - LR Schedule: Linear warmup (10%) + linear decay / 线性预热(10%)+ 线性衰减
- Gradient Clipping:
max_norm=1.0/ 梯度裁剪 - Regularization: Built-in dropout (10% hidden, 10% attention) / 内置 Dropout 正则化
- Epochs: 3 / 训练轮数:3
- Batch Size: 8 / 批次大小:8
- Best Checkpoint Selection: Saved only when validation accuracy improves / 仅在验证准确率提升时保存检查点
Hard Example Mining / 困难样本挖掘
The training pipeline includes an iterative hard example mining loop:
训练流程包含迭代式困难样本挖掘:
- Train on labeled data + accumulated hard examples / 在标注数据和累积的困难样本上训练
- Evaluate on a correction pool (never trained on) / 在矫正池上评估(从未参与训练)
- Collect false positives and false negatives / 收集误报和漏报样本
- Inject them as oversampled (2x) training examples in the next run / 在下一轮训练中以 2 倍过采样注入
- Repeat — hard example files accumulate across runs / 重复——困难样本文件在多轮训练中不断累积
Intended Use / 预期用途
- Email fraud/phishing detection in enterprise or personal email systems / 企业或个人邮件系统中的欺诈/钓鱼邮件检测
- Fast first-pass filter (<50ms inference) before more expensive generative analysis / 在更昂贵的生成式分析之前的快速初筛(<50ms 推理)
- Part of a multi-layer fraud detection pipeline / 多层欺诈检测流水线的一部分
Limitations / 局限性
- Trained primarily on English emails; may underperform on other languages / 主要在英文邮件上训练,其他语言可能效果不佳
- Training data is sourced from the Enron corpus (early 2000s); modern phishing patterns may differ / 训练数据来源于 Enron 语料库(2000 年代初),现代钓鱼模式可能有所不同
- Max input length is 512 tokens; very long emails are truncated / 最大输入长度为 512 token,超长邮件会被截断
- Binary classification only — does not provide threat type breakdown or risk scores (use the generative model for detailed analysis) / 仅二分类——不提供威胁类型细分或风险评分(详细分析请使用生成式模型)
Citation / 引用
@misc{cunxin2025roberta-email-fraud,
title={RoBERTa Email Fraud Detector},
author={cunxin},
year={2025},
url={https://huggingface.co/cunxin/roberta-email-fraud-detector}
}
- Downloads last month
- 90
Dataset used to train cunxin/roberta-email-fraud-detector
Evaluation results
- Accuracy on Enron Email (held-out test set)self-reported0.995
- Detection Rate (TPR) on Enron Email (held-out test set)self-reported0.994
- False Positive Rate (FPR) on Enron Email (held-out test set)self-reported0.005