PRB Manipulation Detector (PRB)

Multi-task model for the PersuasionRisk Benchmark (PRB). Given a message, it predicts:

manipulation: regression score in ([0, 1])
tactics: multi-label tactics (see prb/taxonomy.py)
stage: one of PRB stages (see prb/taxonomy.py)
severity: integer 1–5

This repository is a PyTorch checkpoint saved as model.pt for the custom PRBModel architecture (not a standard Transformers AutoModelForSequenceClassification).

Model repo

Repo: NorthernTribe-Research/prb-manipulation-detector-20260217

Training data

Trained using the combined PRB dataset released here:

Dataset: NorthernTribe-Research/prb-persuasion-risk-benchmark

Note: this particular checkpoint was produced via a quick CPU run (small training subset) to validate the end-to-end pipeline and Hub upload.

How to run inference

Use the inference script from the PRB project (it knows how to load model.pt into PRBModel).

1) Download the model snapshot

from huggingface_hub import snapshot_download

model_dir = snapshot_download("NorthernTribe-Research/prb-manipulation-detector-20260217")
print(model_dir)

2) Run inference

python model/infer.py --model_dir "$model_dir" --text "URGENT: verify your account or it will be closed."

Intended use

Research / benchmarking for persuasion & manipulation signals.
Security-analytics prototyping (e.g., phishing / social-engineering risk scoring).

Limitations

Not production-ready: this checkpoint is a quick baseline.
Domain bias: training data is phishing/manipulation heavy; may not generalize to other domains.
False positives are possible on urgent but benign messages.

License

MIT

Downloads last month: 3