RoBERTa Email Fraud Detector

A fine-tuned RoBERTa-base model for binary email fraud/phishing classification. Given an email (subject + body), the model predicts whether it is fraud (1) or normal (0) with high accuracy and very low false-positive rate.

基于 RoBERTa-base 微调的二分类邮件欺诈/钓鱼检测模型。输入邮件(主题+正文),模型预测该邮件是 欺诈 (1) 还是 **正常 (0)**,具有高准确率和极低的误报率。

Model Description / 模型描述

Architecture RobertaForSequenceClassification (roberta-base + linear classification head)
Parameters ~125 million
Max Input Length 512 tokens
Output 2 classes: 0 = normal/ham, 1 = fraud/spam
Model Size 475 MB

Lineage / 模型血统

roberta-base (Meta AI — pre-trained on 160GB English text)
    └─► mshenoda/roberta-spam (fine-tuned on Enron spam corpus)
        └─► cunxin/roberta-email-fraud-detector (continued fine-tuning on expanded email dataset)

The model inherits general language understanding from roberta-base and spam pattern recognition from mshenoda/roberta-spam. Our continued fine-tuning adapts it to a broader email fraud distribution including phishing, scam, and social engineering patterns.

该模型继承了 roberta-base 的通用语言理解能力和 mshenoda/roberta-spam 的垃圾邮件模式识别能力。我们的继续微调使其适应了更广泛的邮件欺诈分布,包括钓鱼、诈骗和社会工程攻击模式。

Performance / 性能指标

Evaluated on a held-out test set of ~12,250 emails (never seen during training):

在约 12,250 封从未参与训练的测试邮件上评估:

Metric / 指标 Value / 值
Overall Accuracy / 总体准确率 99.5%
Detection Rate (TPR) / 检出率 99.4% (5,841 / 5,876 fraud detected)
False Positive Rate (FPR) / 误报率 0.5% (31 / 6,374 normal misclassified)
False Negatives / 漏报数 35
False Positives / 误报数 31

Confusion Matrix / 混淆矩阵

                    Predicted Fraud    Predicted Normal
Actual Fraud            5,841 (TP)          35 (FN)
Actual Normal              31 (FP)       6,343 (TN)

Correction Pool Results / 矫正池结果

Evaluated on a separate held-out correction pool (completely independent from both training and test sets):

在独立的矫正池上评估(与训练集和测试集完全独立):

  • 2,808 fraud emails → only 1 false negative / 2,808 封欺诈邮件 → 仅 1 封漏报
  • 2,982 normal emails → 0 false positives / 2,982 封正常邮件 → 0 封误报

Usage / 使用方法

Quick Start / 快速开始

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "cunxin/roberta-email-fraud-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

# Format: "Subject: {subject}\n\n{content}"
email_text = "Subject: URGENT: Verify your account immediately\n\nDear Customer, your account has been compromised. Click here to verify: http://suspicious-link.com"

inputs = tokenizer(email_text, return_tensors="pt", max_length=512, truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits, dim=-1)
    fraud_prob = probs[0, 1].item()

print(f"Fraud probability: {fraud_prob:.4f}")
print(f"Prediction: {'FRAUD' if fraud_prob >= 0.5 else 'NORMAL'}")

Input Format / 输入格式

The model expects emails formatted as:

模型期望的输入格式:

Subject: {email subject}\n\n{email body}
  • Content is truncated to ~1,800 characters (fits within 512 tokens after tokenization) / 正文截断至约 1,800 字符(分词后不超过 512 token)
  • Subject is placed first to ensure it is never truncated / 主题放在最前面以确保不被截断

Output / 输出

The model outputs 2 logits:

模型输出 2 个 logits:

  • Index 0: normal/ham score / 正常邮件分数
  • Index 1: fraud/spam score / 欺诈邮件分数

Apply softmax to get probabilities. probs[0, 1] >= 0.5 → fraud. / 经 softmax 转换为概率,probs[0, 1] >= 0.5 则判定为欺诈。

Training / 训练详情

Training Data / 训练数据

Dataset / 数据集 Count / 数量 Source / 来源
Fraud emails (train) ~5,877 Enron corpus via SetFit/enron_spam + SpamAssassin
Normal emails (train) ~6,374 Enron corpus via SetFit/enron_spam + Ling-Spam
Fraud emails (test) ~5,876 Same sources, held-out split
Normal emails (test) ~6,374 Same sources, held-out split

Training Method / 训练方法

  • Task: Binary sequence classification (fraud vs. normal) / 二分类序列分类(欺诈 vs 正常)
  • Loss: Weighted cross-entropy with normal_weight=2.0 (penalizes false positives 2x more heavily) / 加权交叉熵,正常类权重为 2.0(误报惩罚为漏报的 2 倍)
  • Optimizer: AdamW (lr=2e-5, weight_decay=0.01) / AdamW 优化器
  • LR Schedule: Linear warmup (10%) + linear decay / 线性预热(10%)+ 线性衰减
  • Gradient Clipping: max_norm=1.0 / 梯度裁剪
  • Regularization: Built-in dropout (10% hidden, 10% attention) / 内置 Dropout 正则化
  • Epochs: 3 / 训练轮数:3
  • Batch Size: 8 / 批次大小:8
  • Best Checkpoint Selection: Saved only when validation accuracy improves / 仅在验证准确率提升时保存检查点

Hard Example Mining / 困难样本挖掘

The training pipeline includes an iterative hard example mining loop:

训练流程包含迭代式困难样本挖掘:

  1. Train on labeled data + accumulated hard examples / 在标注数据和累积的困难样本上训练
  2. Evaluate on a correction pool (never trained on) / 在矫正池上评估(从未参与训练)
  3. Collect false positives and false negatives / 收集误报和漏报样本
  4. Inject them as oversampled (2x) training examples in the next run / 在下一轮训练中以 2 倍过采样注入
  5. Repeat — hard example files accumulate across runs / 重复——困难样本文件在多轮训练中不断累积

Intended Use / 预期用途

  • Email fraud/phishing detection in enterprise or personal email systems / 企业或个人邮件系统中的欺诈/钓鱼邮件检测
  • Fast first-pass filter (<50ms inference) before more expensive generative analysis / 在更昂贵的生成式分析之前的快速初筛(<50ms 推理)
  • Part of a multi-layer fraud detection pipeline / 多层欺诈检测流水线的一部分

Limitations / 局限性

  • Trained primarily on English emails; may underperform on other languages / 主要在英文邮件上训练,其他语言可能效果不佳
  • Training data is sourced from the Enron corpus (early 2000s); modern phishing patterns may differ / 训练数据来源于 Enron 语料库(2000 年代初),现代钓鱼模式可能有所不同
  • Max input length is 512 tokens; very long emails are truncated / 最大输入长度为 512 token,超长邮件会被截断
  • Binary classification only — does not provide threat type breakdown or risk scores (use the generative model for detailed analysis) / 仅二分类——不提供威胁类型细分或风险评分(详细分析请使用生成式模型)

Citation / 引用

@misc{cunxin2025roberta-email-fraud,
  title={RoBERTa Email Fraud Detector},
  author={cunxin},
  year={2025},
  url={https://huggingface.co/cunxin/roberta-email-fraud-detector}
}
Downloads last month
90
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train cunxin/roberta-email-fraud-detector

Evaluation results

  • Accuracy on Enron Email (held-out test set)
    self-reported
    0.995
  • Detection Rate (TPR) on Enron Email (held-out test set)
    self-reported
    0.994
  • False Positive Rate (FPR) on Enron Email (held-out test set)
    self-reported
    0.005