RoBERTa Email Fraud Detector

A fine-tuned RoBERTa-base model for binary email fraud/phishing classification. Given an email (subject + body), the model predicts whether it is fraud (1) or normal (0) with 99.5% accuracy and a 0.5% false-positive rate. Inference takes <50ms per email.

This model serves as the fast pre-screening layer in a dual-model anti-fraud pipeline. Its binary verdict is passed as a [CLASSIFIER HINT] to the generative model (cunxin/llama-email-fraud-detector) for detailed threat analysis with reasoning.

基于 RoBERTa-base 微调的二分类邮件欺诈/钓鱼检测模型。输入邮件（主题+正文），模型预测该邮件是 欺诈 (1) 还是 **正常 (0)**，准确率 99.5%，误报率 0.5%。每封邮件推理时间 <50ms。

本模型作为双模型反欺诈流水线的快速预筛层。其二元判定以 [CLASSIFIER HINT] 形式传递给生成式模型（cunxin/llama-email-fraud-detector）进行带推理的详细威胁分析。

Dual-Model Pipeline / 双模型流水线

This model is designed to work with cunxin/llama-email-fraud-detector in a reconciled pipeline:

本模型设计为与 cunxin/llama-email-fraud-detector 配合使用：

Email Input
    │
    ├──► RoBERTa (this model, <50ms) ◄── fast binary verdict
    │         │
    │         ▼  [CLASSIFIER HINT] verdict=FRAUD | confidence=0.97 | risk_score=99
    │
    ├──► Llama (generative, ~1-3s) ◄── detailed analysis
    │         │
    │         ▼  Full JSON: 11 threat types, risk score, reason, suggestion
    │
    ▼
Reconciliation (backend)
    │
    ▼
Final verdict with both speed and depth

Model / 模型	Role / 角色	Speed / 速度	Output / 输出
cunxin/roberta-email-fraud-detector (this)	Fast pre-screen / 快速预筛	<50ms	Binary: fraud/normal + confidence
cunxin/llama-email-fraud-detector	Detailed analysis / 详细分析	~1-3s	JSON: 11 threat types, reasoning
cunxin/llama-email-fraud-detector-awq	Same, for low VRAM / 低显存版	~1-3s	Same JSON output

Model Description / 模型描述


Architecture	`RobertaForSequenceClassification` (roberta-base + linear classification head)
Parameters	~125 million
Max Input Length	512 tokens
Output	2 classes: `0` = normal/ham, `1` = fraud/spam
Model Size	475 MB
Inference Speed	<50ms on GPU, ~200ms on CPU

Lineage / 模型血统

roberta-base (Meta AI — pre-trained on 160GB English text)
    └─► mshenoda/roberta-spam (fine-tuned on Enron spam corpus)
        └─► cunxin/roberta-email-fraud-detector (continued fine-tuning on expanded dataset)

The model inherits general language understanding from roberta-base and spam pattern recognition from mshenoda/roberta-spam. Our continued fine-tuning adapts it to a broader email fraud distribution including phishing, scam, and social engineering patterns.

该模型继承了 roberta-base 的通用语言理解能力和 mshenoda/roberta-spam 的垃圾邮件模式识别能力。我们的继续微调使其适应了更广泛的邮件欺诈分布，包括钓鱼、诈骗和社会工程攻击模式。

Performance / 性能指标

Evaluated on a held-out test set of ~12,250 emails (never seen during training):

在约 12,250 封从未参与训练的测试邮件上评估：

Metric / 指标	Value / 值
Overall Accuracy / 总体准确率	99.5%
Detection Rate (TPR) / 检出率	99.4% (5,841 / 5,876 fraud detected)
False Positive Rate (FPR) / 误报率	0.5% (31 / 6,374 normal misclassified)
False Negatives / 漏报数	35
False Positives / 误报数	31

Confusion Matrix / 混淆矩阵

                    Predicted Fraud    Predicted Normal
Actual Fraud            5,841 (TP)          35 (FN)
Actual Normal              31 (FP)       6,343 (TN)

Correction Pool Results / 矫正池结果

Evaluated on a separate held-out correction pool (completely independent from both training and test sets):

在独立的矫正池上评估（与训练集和测试集完全独立）：

2,808 fraud emails -> only 1 false negative / 2,808 封欺诈邮件 -> 仅 1 封漏报
2,982 normal emails -> 0 false positives / 2,982 封正常邮件 -> 0 封误报

Usage / 使用方法

Quick Start / 快速开始

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "cunxin/roberta-email-fraud-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

# Format: "Subject: {subject}\n\n{content}"
email_text = "Subject: URGENT: Verify your account immediately\n\nDear Customer, your account has been compromised. Click here to verify: http://suspicious-link.com"

inputs = tokenizer(email_text, return_tensors="pt", max_length=512, truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    fraud_prob = probs[0, 1].item()

print(f"Fraud probability: {fraud_prob:.4f}")
print(f"Prediction: {'FRAUD' if fraud_prob >= 0.5 else 'NORMAL'}")

Input Format / 输入格式

The model expects emails formatted as:

模型期望的输入格式：

Subject: {email subject}\n\n{email body}

Content is truncated to ~1,800 characters (fits within 512 tokens after tokenization) / 正文截断至约 1,800 字符
Subject is placed first to ensure it is never truncated / 主题放在最前面以确保不被截断

Output / 输出

The model outputs 2 logits:

模型输出 2 个 logits：

Index 0: normal/ham score / 正常邮件分数
Index 1: fraud/spam score / 欺诈邮件分数

Apply softmax to get probabilities. probs[0, 1] >= 0.5 -> fraud.

经 softmax 转换为概率，probs[0, 1] >= 0.5 则判定为欺诈。

Training / 训练详情

Training Data / 训练数据

Dataset / 数据集	Count / 数量	Source / 来源
Fraud emails (train)	~5,877	Enron corpus via `SetFit/enron_spam` + SpamAssassin
Normal emails (train)	~6,374	Enron corpus via `SetFit/enron_spam` + Ling-Spam
Fraud emails (test)	~5,876	Same sources, held-out split
Normal emails (test)	~6,374	Same sources, held-out split

Training Method / 训练方法


Task	Binary sequence classification (fraud vs. normal)
Loss	Weighted cross-entropy with `normal_weight=2.0` (penalizes false positives 2x)
Optimizer	AdamW (`lr=2e-5`, `weight_decay=0.01`)
LR Schedule	Linear warmup (10%) + linear decay
Gradient Clipping	`max_norm=1.0`
Regularization	Built-in dropout (10% hidden, 10% attention)
Epochs	3
Batch Size	8
Checkpoint Selection	Saved only when validation accuracy improves

Hard Example Mining / 困难样本挖掘

The training pipeline includes an iterative hard example mining loop:

训练流程包含迭代式困难样本挖掘：

Train on labeled data + accumulated hard examples / 在标注数据和累积的困难样本上训练
Evaluate on a correction pool (never trained on) / 在矫正池上评估
Collect false positives and false negatives / 收集误报和漏报样本
Inject them as oversampled (2x) training examples in the next run / 在下一轮训练中以 2 倍过采样注入
Repeat — hard example files accumulate across runs / 重复——困难样本在多轮训练中不断累积

Intended Use / 预期用途

Email fraud/phishing detection in enterprise or personal email systems / 企业或个人邮件系统中的欺诈/钓鱼邮件检测
Fast first-pass filter (<50ms inference) before more expensive generative analysis / 在更昂贵的生成式分析之前的快速初筛
Part of a multi-layer fraud detection pipeline / 多层欺诈检测流水线的一部分
Microsoft Office Add-in integration for Outlook/Word / Microsoft Office 插件集成

Limitations / 局限性

Trained primarily on English emails; may underperform on other languages / 主要在英文邮件上训练
Training data is sourced from the Enron corpus (early 2000s); modern phishing patterns may differ / 训练数据来源于 Enron 语料库（2000 年代初）
Max input length is 512 tokens; very long emails are truncated / 最大输入长度为 512 token
Binary classification only — does not provide threat type breakdown or risk scores (use the generative model for detailed analysis) / 仅二分类——不提供威胁类型细分或风险评分

Citation / 引用

@misc{cunxin2025roberta-email-fraud,
  title={RoBERTa Email Fraud Detector},
  author={cunxin},
  year={2025},
  url={https://huggingface.co/cunxin/roberta-email-fraud-detector}
}

Downloads last month: 7

Safetensors

Model size

0.1B params

Tensor type

F32

Dataset used to train cunxin/roberta-email-fraud-detector

Evaluation results

Accuracy on Enron Email (held-out test set)
self-reported

0.995
Detection Rate (TPR) on Enron Email (held-out test set)
self-reported

0.994
False Positive Rate (FPR) on Enron Email (held-out test set)
self-reported

0.005