Agreemind English Banking Risk Classifier
RoBERTa-base fine-tuned for multi-label clause-level risk detection in English consumer banking contracts.
This model takes a single clause or paragraph from a banking agreement and predicts which risk categories (if any) it contains. It is designed for use in automated contract analysis pipelines where the system must flag potentially unfair or risky terms for consumer review.
Model Details
| Property | Value |
|---|---|
| Base Model | roberta-base (125M parameters) |
| Task | Multi-label clause-level classification |
| Language | English |
| Domain | Consumer banking contracts (credit cards, deposit accounts, prepaid cards, HELOC, online banking) |
| Training Data | 88 US banking contracts, 4,334 clauses, from 17+ banks |
| Loss | BCEWithLogitsLoss with per-label pos_weight |
| Max Length | 256 tokens |
Performance
Held-Out Evaluation (70/15/15 contract-level split)
| Threshold | micro-P | micro-R | micro-F1 | macro-F1 | Subset Acc | Hamming Loss |
|---|---|---|---|---|---|---|
| Default 0.5 | 0.747 | 0.872 | 0.805 | 0.728 | 0.664 | 0.050 |
| Global 0.70 | 0.797 | 0.820 | 0.808 | 0.713 | 0.682 | 0.046 |
| Per-label | 0.780 | 0.833 | 0.806 | 0.712 | 0.671 | 0.048 |
5-Fold Cross-Validation (contract-level folds)
| Metric | Mean | ± Std |
|---|---|---|
| micro-F1 | 0.805 | ± 0.014 |
| macro-F1 | 0.728 | ± 0.017 |
| Subset Accuracy | 0.655 | ± 0.019 |
| Hamming Loss | 0.049 | ± 0.004 |
Per-Label Results (5-fold CV, default 0.5 threshold)
| Label | F1 (mean ± std) | Support |
|---|---|---|
| hidden_fees | 0.873 ± 0.022 | 1,130 |
| dispute_limitation | 0.840 ± 0.013 | 948 |
| account_freeze_or_closure | 0.813 ± 0.024 | 943 |
| data_sharing | 0.800 ± 0.020 | 398 |
| unilateral_terms_change | 0.764 ± 0.050 | 362 |
| overdraft_or_overlimit_penalty | 0.749 ± 0.045 | 332 |
| unilateral_rate_change | 0.684 ± 0.038 | 199 |
| auto_enrollment | 0.515 ± 0.074 | 141 |
| rewards_restriction_or_devaluation | 0.513 ± 0.196 | 32 |
Baseline Comparison
| Model | micro-F1 | macro-F1 |
|---|---|---|
| TF-IDF + Logistic Regression | 0.761 | 0.561 |
| RoBERTa (this model) | 0.808 | 0.728 |
Risk Categories
The model classifies clauses into 9 risk categories:
| ID | Label | Description | Severity |
|---|---|---|---|
| 0 | hidden_fees |
Undisclosed fees, penalty charges, or fee escalation clauses | High |
| 1 | unilateral_rate_change |
Bank can change interest rates without customer consent | High |
| 2 | unilateral_terms_change |
Bank can modify agreement terms unilaterally | Medium |
| 3 | overdraft_or_overlimit_penalty |
Overdraft fees, over-limit charges, penalty APR triggers | High |
| 4 | auto_enrollment |
Automatic enrollment in services requiring opt-out | Medium |
| 5 | data_sharing |
Sharing personal/financial data with third parties | Medium |
| 6 | dispute_limitation |
Restrictions on dispute resolution (arbitration, class action waivers) | High |
| 7 | account_freeze_or_closure |
Bank can freeze, suspend, or close account | Critical |
| 8 | rewards_restriction_or_devaluation |
Changes to rewards programs, point devaluation | Low |
A clause with no risk labels is considered fair (all-zero prediction).
Usage
Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "Agreemind/en-banking-roberta"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
LABELS = [
"hidden_fees", "unilateral_rate_change", "unilateral_terms_change",
"overdraft_or_overlimit_penalty", "auto_enrollment", "data_sharing",
"dispute_limitation", "account_freeze_or_closure",
"rewards_restriction_or_devaluation",
]
text = "We may change the interest rate on your account at any time without prior notice."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
probs = torch.sigmoid(model(**inputs).logits).squeeze()
# Using recommended global threshold of 0.70
threshold = 0.70
for label, prob in sorted(zip(LABELS, probs), key=lambda x: x[1], reverse=True):
flag = " ⚠️" if prob >= threshold else ""
print(f" {prob:.3f} {label}{flag}")
With Pipeline API
from transformers import pipeline
pipe = pipeline(
"text-classification",
model="Agreemind/en-banking-roberta",
top_k=None,
function_to_apply="sigmoid",
)
result = pipe("The bank may close your account at any time for any reason.")
# Filter with threshold
risky = [r for r in result[0] if r["score"] >= 0.70]
print(risky)
Recommended Thresholds
For production use, we recommend the global threshold of 0.70 which was tuned on the validation set and consistently outperformed both default 0.5 and per-label thresholds across 5-fold CV.
Optimized per-label thresholds are included in thresholds.json in this repository for advanced use cases.
Training Details
| Parameter | Value |
|---|---|
| Learning rate | 2e-5 |
| Epochs | 6 |
| Batch size | 16 (train) / 32 (eval) |
| Max length | 256 tokens |
| Weight decay | 0.01 |
| Warmup ratio | 0.1 |
| Early stopping | patience=3 on validation micro-F1 |
| Imbalance handling | BCEWithLogitsLoss with pos_weight computed from training data |
| Data split | 70% train / 15% dev / 15% test (contract-level, seed=42) |
| Fair clause balancing | 30% target ratio in training split |
Training Data
88 contracts from 17+ US banks including Chase, Bank of America, Wells Fargo, Citibank, US Bank, Capital One, Ally Bank, Truist, TD Bank, Regions, Citizens, HSBC USA, Santander, and others.
Contract types: credit card agreements, prepaid card agreements, deposit account agreements, online banking service agreements, HELOC disclosures, and treasury management agreements.
Limitations
- Rare label weakness:
auto_enrollment(141 examples) andrewards_restriction_or_devaluation(32 examples) have limited training data and show higher variance - US-centric: Trained exclusively on US banking contracts; may not generalize well to UK, EU, or other jurisdictions
- English only: Does not support other languages
- Clause-level only: Expects pre-segmented clauses/paragraphs, not full documents
Part of the Agreemind Platform
This model is part of the Agreemind AI-powered legal document analysis platform. Other models in the family:
| Model | Task | Language |
|---|---|---|
| lexglue-roberta-unfair-tos | Terms of Service analysis | English |
| contractnli-distilbert-nda | NDA provision detection | English |
| banking-bert-turkish | Banking contract analysis | Turkish |
| en-banking-roberta (this model) | Banking contract analysis | English |
License
MIT
- Downloads last month
- 57
Model tree for Agreemind/en-banking-roberta
Base model
FacebookAI/roberta-baseEvaluation results
- Micro-F1 (held-out test)self-reported0.808
- Micro-F1 (5-fold CV mean)self-reported0.805