Llama Email Fraud Detector (bf16)
A fine-tuned Llama-3.2-3B-Instruct model for structured email fraud/phishing analysis. Given an email, the model outputs a detailed JSON verdict including 11 threat type labels, a 0-100 risk score, human-readable reasoning, and a suggested action.
This is the explanation layer of a dual-model anti-fraud pipeline. The discriminative model (cunxin/roberta-email-fraud-detector, 99.5% accuracy, <50ms) provides a fast binary pre-screen; its result is passed to this generative model as a [CLASSIFIER HINT] prior. The final verdict is reconciled by the backend service.
For GPUs with limited VRAM (< 12 GB), use the AWQ 4-bit quantized version: cunxin/llama-email-fraud-detector-awq.
基于 Llama-3.2-3B-Instruct 微调的结构化邮件欺诈/钓鱼分析模型。输入一封邮件,模型输出包含 11 种威胁类型标签、0-100 风险分数、可读推理说明和建议操作的详细 JSON 判决。
本模型是双模型反欺诈流水线的解释层。判别模型(cunxin/roberta-email-fraud-detector,99.5% 准确率,<50ms)提供快速二元预筛,其结果以 [CLASSIFIER HINT] 先验传入本生成式模型。最终判决由后端服务融合。
低显存 GPU(< 12 GB)请使用 AWQ 4-bit 量化版:cunxin/llama-email-fraud-detector-awq。
Model Details / 模型详情
| Architecture | LlamaForCausalLM (Decoder-only Transformer with RoPE, GQA, SwiGLU) |
| Base Model | meta-llama/Llama-3.2-3B-Instruct |
| Parameters | 3,237,063,680 (3.2B) |
| Fine-Tuning | LoRA (r=16, alpha=32, dropout=0.1) merged into base weights |
| Trainable Parameters | 24,313,856 (0.75% of total) |
| Precision | bfloat16 |
| Model Size | 6.4 GB |
| Context Window | 4,096 tokens (inference) / 2,048 tokens (training) |
| Vocabulary | 128,256 tokens |
Lineage / 模型血统
meta-llama/Llama-3.2-3B-Instruct (Meta AI — instruction-tuned on 3T tokens)
│
├── LoRA fine-tuning (r=16, alpha=32, 7 target modules)
│ Training: ~12K email conversations with structured JSON labels
│ Hint injection: 75% correct / 15% adversarial / 10% no hint
│
├── Merge LoRA adapters into base weights
│
├─► cunxin/llama-email-fraud-detector (this model, bf16, 6.4 GB)
│
└─► cunxin/llama-email-fraud-detector-awq (AWQ 4-bit, 2.2 GB)
Output Format / 输出格式
The model outputs structured JSON with 6 fields:
模型输出包含 6 个字段的结构化 JSON:
{
"is_fraud": true,
"risk_score": 95,
"confidence_level": 0.97,
"detected_threats": ["DOMAIN_MISMATCH", "CREDENTIAL_REQUEST", "URGENCY_FEAR"],
"reason": "The sender domain 'amaz0n-verify.com' typosquats amazon.com. The email requests account credentials via a suspicious URL and uses urgency tactics to pressure immediate action.",
"suggestion": "Do not click any links. Do not enter any credentials. Report this email as phishing to your IT department."
}
Threat Types & Scoring / 威胁类型与评分
The model detects 11 threat categories. Risk score = sum of triggered threat points (capped at 100).
模型检测 11 种威胁类别。风险分数 = 触发的威胁点数之和(上限 100)。
| Label / 标签 | Points / 分值 | Description / 描述 |
|---|---|---|
CREDENTIAL_REQUEST |
35 | Asks for passwords, SSN, credit card numbers / 请求密码、身份证号、信用卡号 |
DOMAIN_MISMATCH |
30 | Sender domain does not match claimed organization / 发件人域名与声称的组织不匹配 |
URL_DISCREPANCY |
30 | Links point to suspicious or mismatched domains / 链接指向可疑或不匹配的域名 |
TOO_GOOD_TO_BE_TRUE |
30 | Unrealistic promises (lottery wins, free money) / 不切实际的承诺(中奖、免费资金) |
PROMPT_INJECTION |
30 | Attempts to manipulate AI analysis / 试图操纵 AI 分析 |
URGENCY_FEAR |
15 | Pressure tactics ("act now", "account suspended") / 施压策略("立即行动"、"账号已冻结") |
REPLY_TO_MISMATCH |
15 | Reply-To address differs from sender / 回复地址与发件人不同 |
GENERIC_SALUTATION |
8 | Impersonal greeting ("Dear Customer") / 非个人化称呼("尊敬的客户") |
ANOMALOUS_TIMING |
8 | Sent at unusual hours for the timezone / 在不寻常的时间发送 |
MISSING_SIGNATURE |
8 | No professional email signature / 缺少专业邮件签名 |
GRAMMAR_ANOMALY |
5 | Unusual grammar or spelling patterns / 异常的语法或拼写模式 |
RULE D: Any high-weight threat (>=30 pts) forces is_fraud = true.
规则 D:任何高权重威胁(>=30 分)强制 is_fraud = true。
Dual-Model Pipeline / 双模型流水线
This model is designed to work with cunxin/roberta-email-fraud-detector in a reconciled pipeline:
本模型设计为与 cunxin/roberta-email-fraud-detector 配合使用:
Email Input
│
├──► RoBERTa (discriminative, <50ms)
│ │
│ ▼
│ is_fraud=True/False, confidence=0.97, risk_score=99
│ │
│ ▼ [CLASSIFIER HINT]
├──► Llama (generative, ~1-3s) ◄── this model
│ │
│ ▼
│ Full JSON analysis (11 threat types, reasoning, suggestion)
│
▼
Reconciliation (backend)
│
▼
Final verdict
STEP 4 Hint Rules / 提示规则
When a [CLASSIFIER HINT] is provided, the model applies these rules after its own independent analysis:
| Scenario / 场景 | Action / 操作 |
|---|---|
| Both agree / 两者一致 | Keep generative result / 保留生成式结果 |
| Hint=FRAUD, gen=safe, risk < 40 | Follow hint (classifier caught something subtle) / 遵循提示 |
| Hint=FRAUD, gen=safe, RULE D triggered | Override hint (generative has hard evidence) / 推翻提示 |
| Hint=NOT FRAUD, gen=fraud, RULE D triggered | Override hint (generative has hard evidence) / 推翻提示 |
| Hint=NOT FRAUD, gen=fraud, risk < 60 | Follow hint (only weak signals) / 遵循提示 |
Usage / 使用方法
With vLLM (Recommended) / 使用 vLLM(推荐)
# Set in .env
MODEL_PATH=cunxin/llama-email-fraud-detector
# Start service
docker compose --profile gpu up -d
# Test
curl -X POST http://localhost:8000/predict_generative \
-H "Content-Type: application/json" \
-d '{"sender":"security@amaz0n-verify.com","subject":"URGENT: Account locked","content":"Click to verify: http://amaz0n-secure.xyz"}'
With Transformers / 使用 Transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, json
model_name = "cunxin/llama-email-fraud-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
email = json.dumps({
"date": "2026-02-25T10:00:00Z",
"sender": "security@amaz0n-verify.com",
"recipient": "you@example.com",
"subject": "URGENT: Your account has been locked",
"content": "Click here to verify: http://amaz0n-secure.xyz/verify"
})
messages = [
{"role": "system", "content": "You are an email fraud analyst. Analyze the email and return a JSON verdict."},
{"role": "user", "content": f"Analyze the following email:\n{email}"}
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
output = model.generate(input_ids, max_new_tokens=512, temperature=0.1)
response = tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)
Training / 训练详情
Fine-Tuning Method / 微调方法
| Method | LoRA (Low-Rank Adaptation) via peft + SFT via trl.SFTTrainer |
| LoRA Rank | 16 |
| LoRA Alpha | 32 (effective scale = 2x) |
| LoRA Dropout | 0.1 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Optimizer | AdamW (weight_decay=0.05, label_smoothing=0.05) |
| LR Schedule | Cosine with 10% linear warmup |
| Learning Rate | 2e-4 |
| Epochs | 3 |
| Batch Size | 2 (grad_accum=8, effective=16) |
| Max Sequence Length | 2,048 tokens |
| Mixed Precision | FP16 (CUDA) |
| Loss | Causal LM (next-token prediction on assistant turn only) |
Training Data / 训练数据
~12,000 email conversations generated from the Enron corpus and synthetic sources:
约 12,000 个邮件对话,从 Enron 语料库和合成数据生成:
| Source / 来源 | Count / 数量 | Role / 用途 |
|---|---|---|
| Fraud emails (train) | ~500 per class | Primary training / 主要训练集 |
| Normal emails (train) | ~500 per class | Primary training / 主要训练集 |
| Correction pool (optional) | ~14K fraud + ~15K normal | Extra training via --include-correction |
| AI-generated modern emails | ~800 per class | Extra training via --include-aigen |
Each email is converted into a 3-turn chat conversation (system prompt -> user email -> assistant JSON verdict) with:
- 75% include a correct
[CLASSIFIER HINT](teach trust of accurate priors) - 15% include an adversarial wrong hint (teach RULE D override)
- 10% have no hint (teach independent reasoning)
- 60% of fraud examples include
[HEURISTIC ANALYSIS]context
Hardware Requirements / 硬件要求
| Configuration / 配置 | VRAM / 显存 |
|---|---|
| bf16 (this model) | >= 12 GB (RTX 3060 desktop, RTX 4070+) |
| AWQ 4-bit (llama-email-fraud-detector-awq) | >= 6 GB (RTX 3050, 3060 laptop) |
Intended Use / 预期用途
- Email fraud/phishing detection with detailed threat analysis and human-readable reasoning / 提供详细威胁分析和可读推理的邮件欺诈/钓鱼检测
- Explanation layer for automated email security systems / 自动化邮件安全系统的解释层
- Part of a multi-model pipeline (discriminative pre-screen + generative analysis + reconciliation) / 多模型流水线的组成部分
- Microsoft Office Add-in integration for Outlook/Word / Microsoft Office 插件集成
Limitations / 局限性
- Primarily trained on English emails; may underperform on other languages / 主要在英文邮件上训练
- Training data includes Enron corpus (early 2000s); modern attack patterns partially covered by AI-generated synthetic data / 训练数据包含 Enron 语料库,现代攻击模式部分由 AI 生成合成数据覆盖
- Inference latency ~1-3s per email (use RoBERTa for real-time filtering) / 推理延迟约 1-3 秒/封
- Structured JSON output depends on prompt engineering; edge cases may produce malformed JSON / 边缘情况可能产生格式错误的 JSON
Related Models / 相关模型
| Model / 模型 | Type / 类型 | Size / 大小 | Speed / 速度 | Use Case / 用途 |
|---|---|---|---|---|
| cunxin/roberta-email-fraud-detector | Discriminative | 475 MB | <50ms | Fast binary pre-screen / 快速二元预筛 |
| cunxin/llama-email-fraud-detector (this) | Generative | 6.4 GB | ~1-3s | Detailed threat analysis / 详细威胁分析 |
| cunxin/llama-email-fraud-detector-awq | Generative (quantized) | 2.2 GB | ~1-3s | Same as above, for low VRAM / 同上,低显存版 |
Citation / 引用
@misc{cunxin2025llama-email-fraud,
title={Llama Email Fraud Detector},
author={cunxin},
year={2025},
url={https://huggingface.co/cunxin/llama-email-fraud-detector}
}
- Downloads last month
- 600
Model tree for cunxin/llama-email-fraud-detector
Dataset used to train cunxin/llama-email-fraud-detector
Evaluation results
- Threat Types on Enron Email + Synthetic (held-out test set)self-reported11.000