Llama Email Fraud Detector (bf16)

A fine-tuned Llama-3.2-3B-Instruct model for structured email fraud/phishing analysis. Given an email, the model outputs a detailed JSON verdict including 11 threat type labels, a 0-100 risk score, human-readable reasoning, and a suggested action.

This is the explanation layer of a dual-model anti-fraud pipeline. The discriminative model (cunxin/roberta-email-fraud-detector, 99.5% accuracy, <50ms) provides a fast binary pre-screen; its result is passed to this generative model as a [CLASSIFIER HINT] prior. The final verdict is reconciled by the backend service.

For GPUs with limited VRAM (< 12 GB), use the AWQ 4-bit quantized version: cunxin/llama-email-fraud-detector-awq.

基于 Llama-3.2-3B-Instruct 微调的结构化邮件欺诈/钓鱼分析模型。输入一封邮件，模型输出包含 11 种威胁类型标签、0-100 风险分数、可读推理说明和建议操作的详细 JSON 判决。

本模型是双模型反欺诈流水线的解释层。判别模型（cunxin/roberta-email-fraud-detector，99.5% 准确率，<50ms）提供快速二元预筛，其结果以 [CLASSIFIER HINT] 先验传入本生成式模型。最终判决由后端服务融合。

低显存 GPU（< 12 GB）请使用 AWQ 4-bit 量化版：cunxin/llama-email-fraud-detector-awq。

Model Details / 模型详情


Architecture	`LlamaForCausalLM` (Decoder-only Transformer with RoPE, GQA, SwiGLU)
Base Model	`meta-llama/Llama-3.2-3B-Instruct`
Parameters	3,237,063,680 (3.2B)
Fine-Tuning	LoRA (r=16, alpha=32, dropout=0.1) merged into base weights
Trainable Parameters	24,313,856 (0.75% of total)
Precision	bfloat16
Model Size	6.4 GB
Context Window	4,096 tokens (inference) / 2,048 tokens (training)
Vocabulary	128,256 tokens

Lineage / 模型血统

meta-llama/Llama-3.2-3B-Instruct (Meta AI — instruction-tuned on 3T tokens)
    │
    ├── LoRA fine-tuning (r=16, alpha=32, 7 target modules)
    │     Training: ~12K email conversations with structured JSON labels
    │     Hint injection: 75% correct / 15% adversarial / 10% no hint
    │
    ├── Merge LoRA adapters into base weights
    │
    ├─► cunxin/llama-email-fraud-detector (this model, bf16, 6.4 GB)
    │
    └─► cunxin/llama-email-fraud-detector-awq (AWQ 4-bit, 2.2 GB)

Output Format / 输出格式

The model outputs structured JSON with 6 fields:

模型输出包含 6 个字段的结构化 JSON：

{
  "is_fraud": true,
  "risk_score": 95,
  "confidence_level": 0.97,
  "detected_threats": ["DOMAIN_MISMATCH", "CREDENTIAL_REQUEST", "URGENCY_FEAR"],
  "reason": "The sender domain 'amaz0n-verify.com' typosquats amazon.com. The email requests account credentials via a suspicious URL and uses urgency tactics to pressure immediate action.",
  "suggestion": "Do not click any links. Do not enter any credentials. Report this email as phishing to your IT department."
}

Threat Types & Scoring / 威胁类型与评分

The model detects 11 threat categories. Risk score = sum of triggered threat points (capped at 100).

模型检测 11 种威胁类别。风险分数 = 触发的威胁点数之和（上限 100）。

Label / 标签	Points / 分值	Description / 描述
`CREDENTIAL_REQUEST`	35	Asks for passwords, SSN, credit card numbers / 请求密码、身份证号、信用卡号
`DOMAIN_MISMATCH`	30	Sender domain does not match claimed organization / 发件人域名与声称的组织不匹配
`URL_DISCREPANCY`	30	Links point to suspicious or mismatched domains / 链接指向可疑或不匹配的域名
`TOO_GOOD_TO_BE_TRUE`	30	Unrealistic promises (lottery wins, free money) / 不切实际的承诺（中奖、免费资金）
`PROMPT_INJECTION`	30	Attempts to manipulate AI analysis / 试图操纵 AI 分析
`URGENCY_FEAR`	15	Pressure tactics ("act now", "account suspended") / 施压策略（"立即行动"、"账号已冻结"）
`REPLY_TO_MISMATCH`	15	Reply-To address differs from sender / 回复地址与发件人不同
`GENERIC_SALUTATION`	8	Impersonal greeting ("Dear Customer") / 非个人化称呼（"尊敬的客户"）
`ANOMALOUS_TIMING`	8	Sent at unusual hours for the timezone / 在不寻常的时间发送
`MISSING_SIGNATURE`	8	No professional email signature / 缺少专业邮件签名
`GRAMMAR_ANOMALY`	5	Unusual grammar or spelling patterns / 异常的语法或拼写模式

RULE D: Any high-weight threat (>=30 pts) forces is_fraud = true.

规则 D：任何高权重威胁（>=30 分）强制 is_fraud = true。

Dual-Model Pipeline / 双模型流水线

This model is designed to work with cunxin/roberta-email-fraud-detector in a reconciled pipeline:

本模型设计为与 cunxin/roberta-email-fraud-detector 配合使用：

Email Input
    │
    ├──► RoBERTa (discriminative, <50ms)
    │         │
    │         ▼
    │     is_fraud=True/False, confidence=0.97, risk_score=99
    │         │
    │         ▼  [CLASSIFIER HINT]
    ├──► Llama (generative, ~1-3s) ◄── this model
    │         │
    │         ▼
    │     Full JSON analysis (11 threat types, reasoning, suggestion)
    │
    ▼
Reconciliation (backend)
    │
    ▼
Final verdict

STEP 4 Hint Rules / 提示规则

When a [CLASSIFIER HINT] is provided, the model applies these rules after its own independent analysis:

Scenario / 场景	Action / 操作
Both agree / 两者一致	Keep generative result / 保留生成式结果
Hint=FRAUD, gen=safe, risk < 40	Follow hint (classifier caught something subtle) / 遵循提示
Hint=FRAUD, gen=safe, RULE D triggered	Override hint (generative has hard evidence) / 推翻提示
Hint=NOT FRAUD, gen=fraud, RULE D triggered	Override hint (generative has hard evidence) / 推翻提示
Hint=NOT FRAUD, gen=fraud, risk < 60	Follow hint (only weak signals) / 遵循提示

Usage / 使用方法

With vLLM (Recommended) / 使用 vLLM（推荐）

# Set in .env
MODEL_PATH=cunxin/llama-email-fraud-detector

# Start service
docker compose --profile gpu up -d

# Test
curl -X POST http://localhost:8000/predict_generative \
  -H "Content-Type: application/json" \
  -d '{"sender":"security@amaz0n-verify.com","subject":"URGENT: Account locked","content":"Click to verify: http://amaz0n-secure.xyz"}'

With Transformers / 使用 Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, json

model_name = "cunxin/llama-email-fraud-detector"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")

email = json.dumps({
    "date": "2026-02-25T10:00:00Z",
    "sender": "security@amaz0n-verify.com",
    "recipient": "you@example.com",
    "subject": "URGENT: Your account has been locked",
    "content": "Click here to verify: http://amaz0n-secure.xyz/verify"
})

messages = [
    {"role": "system", "content": "You are an email fraud analyst. Analyze the email and return a JSON verdict."},
    {"role": "user", "content": f"Analyze the following email:\n{email}"}
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
output = model.generate(input_ids, max_new_tokens=512, temperature=0.1)
response = tokenizer.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

Training / 训练详情

Fine-Tuning Method / 微调方法


Method	LoRA (Low-Rank Adaptation) via `peft` + SFT via `trl.SFTTrainer`
LoRA Rank	16
LoRA Alpha	32 (effective scale = 2x)
LoRA Dropout	0.1
Target Modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Optimizer	AdamW (weight_decay=0.05, label_smoothing=0.05)
LR Schedule	Cosine with 10% linear warmup
Learning Rate	2e-4
Epochs	3
Batch Size	2 (grad_accum=8, effective=16)
Max Sequence Length	2,048 tokens
Mixed Precision	FP16 (CUDA)
Loss	Causal LM (next-token prediction on assistant turn only)

Training Data / 训练数据

~12,000 email conversations generated from the Enron corpus and synthetic sources:

约 12,000 个邮件对话，从 Enron 语料库和合成数据生成：

Source / 来源	Count / 数量	Role / 用途
Fraud emails (train)	~500 per class	Primary training / 主要训练集
Normal emails (train)	~500 per class	Primary training / 主要训练集
Correction pool (optional)	~14K fraud + ~15K normal	Extra training via `--include-correction`
AI-generated modern emails	~800 per class	Extra training via `--include-aigen`

Each email is converted into a 3-turn chat conversation (system prompt -> user email -> assistant JSON verdict) with:

75% include a correct [CLASSIFIER HINT] (teach trust of accurate priors)
15% include an adversarial wrong hint (teach RULE D override)
10% have no hint (teach independent reasoning)
60% of fraud examples include [HEURISTIC ANALYSIS] context

Hardware Requirements / 硬件要求

Configuration / 配置	VRAM / 显存
bf16 (this model)	>= 12 GB (RTX 3060 desktop, RTX 4070+)
AWQ 4-bit (llama-email-fraud-detector-awq)	>= 6 GB (RTX 3050, 3060 laptop)

Intended Use / 预期用途

Email fraud/phishing detection with detailed threat analysis and human-readable reasoning / 提供详细威胁分析和可读推理的邮件欺诈/钓鱼检测
Explanation layer for automated email security systems / 自动化邮件安全系统的解释层
Part of a multi-model pipeline (discriminative pre-screen + generative analysis + reconciliation) / 多模型流水线的组成部分
Microsoft Office Add-in integration for Outlook/Word / Microsoft Office 插件集成

Limitations / 局限性

Primarily trained on English emails; may underperform on other languages / 主要在英文邮件上训练
Training data includes Enron corpus (early 2000s); modern attack patterns partially covered by AI-generated synthetic data / 训练数据包含 Enron 语料库，现代攻击模式部分由 AI 生成合成数据覆盖
Inference latency ~1-3s per email (use RoBERTa for real-time filtering) / 推理延迟约 1-3 秒/封
Structured JSON output depends on prompt engineering; edge cases may produce malformed JSON / 边缘情况可能产生格式错误的 JSON

Related Models / 相关模型

Model / 模型	Type / 类型	Size / 大小	Speed / 速度	Use Case / 用途
cunxin/roberta-email-fraud-detector	Discriminative	475 MB	<50ms	Fast binary pre-screen / 快速二元预筛
cunxin/llama-email-fraud-detector (this)	Generative	6.4 GB	~1-3s	Detailed threat analysis / 详细威胁分析
cunxin/llama-email-fraud-detector-awq	Generative (quantized)	2.2 GB	~1-3s	Same as above, for low VRAM / 同上，低显存版

Citation / 引用

@misc{cunxin2025llama-email-fraud,
  title={Llama Email Fraud Detector},
  author={cunxin},
  year={2025},
  url={https://huggingface.co/cunxin/llama-email-fraud-detector}
}

Downloads last month: 600

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for cunxin/llama-email-fraud-detector

Base model

meta-llama/Llama-3.2-3B-Instruct

Adapter

(670)

this model

Quantizations

1 model

Dataset used to train cunxin/llama-email-fraud-detector

Evaluation results

Threat Types on Enron Email + Synthetic (held-out test set)
self-reported

11.000