PRB Manipulation Detector (PRB)
Multi-task model for the PersuasionRisk Benchmark (PRB). Given a message, it predicts:
- manipulation: regression score in ([0, 1])
- tactics: multi-label tactics (see
prb/taxonomy.py) - stage: one of PRB stages (see
prb/taxonomy.py) - severity: integer 1–5
This repository is a PyTorch checkpoint saved as model.pt for the custom PRBModel architecture (not a standard Transformers AutoModelForSequenceClassification).
Model repo
- Repo:
NorthernTribe-Research/prb-manipulation-detector-20260217
Training data
Trained using the combined PRB dataset released here:
- Dataset:
NorthernTribe-Research/prb-persuasion-risk-benchmark
Note: this particular checkpoint was produced via a quick CPU run (small training subset) to validate the end-to-end pipeline and Hub upload.
How to run inference
Use the inference script from the PRB project (it knows how to load model.pt into PRBModel).
1) Download the model snapshot
from huggingface_hub import snapshot_download
model_dir = snapshot_download("NorthernTribe-Research/prb-manipulation-detector-20260217")
print(model_dir)
2) Run inference
python model/infer.py --model_dir "$model_dir" --text "URGENT: verify your account or it will be closed."
Intended use
- Research / benchmarking for persuasion & manipulation signals.
- Security-analytics prototyping (e.g., phishing / social-engineering risk scoring).
Limitations
- Not production-ready: this checkpoint is a quick baseline.
- Domain bias: training data is phishing/manipulation heavy; may not generalize to other domains.
- False positives are possible on urgent but benign messages.
License
MIT
- Downloads last month
- 16