Agreemind English Banking Risk Classifier

RoBERTa-base fine-tuned for multi-label clause-level risk detection in English consumer banking contracts.

This model takes a single clause or paragraph from a banking agreement and predicts which risk categories (if any) it contains. It is designed for use in automated contract analysis pipelines where the system must flag potentially unfair or risky terms for consumer review.

Model Details

Property	Value
Base Model	roberta-base (125M parameters)
Task	Multi-label clause-level classification
Language	English
Domain	Consumer banking contracts (credit cards, deposit accounts, prepaid cards, HELOC, online banking)
Training Data	88 US banking contracts, 4,334 clauses, from 17+ banks
Loss	BCEWithLogitsLoss with per-label pos_weight
Max Length	256 tokens

Performance

Held-Out Evaluation (70/15/15 contract-level split)

Threshold	micro-P	micro-R	micro-F1	macro-F1	Subset Acc	Hamming Loss
Default 0.5	0.747	0.872	0.805	0.728	0.664	0.050
Global 0.70	0.797	0.820	0.808	0.713	0.682	0.046
Per-label	0.780	0.833	0.806	0.712	0.671	0.048

5-Fold Cross-Validation (contract-level folds)

Metric	Mean	± Std
micro-F1	0.805	± 0.014
macro-F1	0.728	± 0.017
Subset Accuracy	0.655	± 0.019
Hamming Loss	0.049	± 0.004

Per-Label Results (5-fold CV, default 0.5 threshold)

Label	F1 (mean ± std)	Support
hidden_fees	0.873 ± 0.022	1,130
dispute_limitation	0.840 ± 0.013	948
account_freeze_or_closure	0.813 ± 0.024	943
data_sharing	0.800 ± 0.020	398
unilateral_terms_change	0.764 ± 0.050	362
overdraft_or_overlimit_penalty	0.749 ± 0.045	332
unilateral_rate_change	0.684 ± 0.038	199
auto_enrollment	0.515 ± 0.074	141
rewards_restriction_or_devaluation	0.513 ± 0.196	32

Baseline Comparison

Model	micro-F1	macro-F1
TF-IDF + Logistic Regression	0.761	0.561
RoBERTa (this model)	0.808	0.728

Risk Categories

The model classifies clauses into 9 risk categories:

ID	Label	Description	Severity
0	`hidden_fees`	Undisclosed fees, penalty charges, or fee escalation clauses	High
1	`unilateral_rate_change`	Bank can change interest rates without customer consent	High
2	`unilateral_terms_change`	Bank can modify agreement terms unilaterally	Medium
3	`overdraft_or_overlimit_penalty`	Overdraft fees, over-limit charges, penalty APR triggers	High
4	`auto_enrollment`	Automatic enrollment in services requiring opt-out	Medium
5	`data_sharing`	Sharing personal/financial data with third parties	Medium
6	`dispute_limitation`	Restrictions on dispute resolution (arbitration, class action waivers)	High
7	`account_freeze_or_closure`	Bank can freeze, suspend, or close account	Critical
8	`rewards_restriction_or_devaluation`	Changes to rewards programs, point devaluation	Low

A clause with no risk labels is considered fair (all-zero prediction).

Usage

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "Agreemind/en-banking-roberta"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

LABELS = [
    "hidden_fees", "unilateral_rate_change", "unilateral_terms_change",
    "overdraft_or_overlimit_penalty", "auto_enrollment", "data_sharing",
    "dispute_limitation", "account_freeze_or_closure",
    "rewards_restriction_or_devaluation",
]

text = "We may change the interest rate on your account at any time without prior notice."

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    probs = torch.sigmoid(model(**inputs).logits).squeeze()

# Using recommended global threshold of 0.70
threshold = 0.70
for label, prob in sorted(zip(LABELS, probs), key=lambda x: x[1], reverse=True):
    flag = " ⚠️" if prob >= threshold else ""
    print(f"  {prob:.3f}  {label}{flag}")

With Pipeline API

from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="Agreemind/en-banking-roberta",
    top_k=None,
    function_to_apply="sigmoid",
)

result = pipe("The bank may close your account at any time for any reason.")
# Filter with threshold
risky = [r for r in result[0] if r["score"] >= 0.70]
print(risky)

Recommended Thresholds

For production use, we recommend the global threshold of 0.70 which was tuned on the validation set and consistently outperformed both default 0.5 and per-label thresholds across 5-fold CV.

Optimized per-label thresholds are included in thresholds.json in this repository for advanced use cases.

Training Details

Parameter	Value
Learning rate	2e-5
Epochs	6
Batch size	16 (train) / 32 (eval)
Max length	256 tokens
Weight decay	0.01
Warmup ratio	0.1
Early stopping	patience=3 on validation micro-F1
Imbalance handling	BCEWithLogitsLoss with pos_weight computed from training data
Data split	70% train / 15% dev / 15% test (contract-level, seed=42)
Fair clause balancing	30% target ratio in training split

Training Data

88 contracts from 17+ US banks including Chase, Bank of America, Wells Fargo, Citibank, US Bank, Capital One, Ally Bank, Truist, TD Bank, Regions, Citizens, HSBC USA, Santander, and others.

Contract types: credit card agreements, prepaid card agreements, deposit account agreements, online banking service agreements, HELOC disclosures, and treasury management agreements.

Limitations

Rare label weakness: auto_enrollment (141 examples) and rewards_restriction_or_devaluation (32 examples) have limited training data and show higher variance
US-centric: Trained exclusively on US banking contracts; may not generalize well to UK, EU, or other jurisdictions
English only: Does not support other languages
Clause-level only: Expects pre-segmented clauses/paragraphs, not full documents

Part of the Agreemind Platform

This model is part of the Agreemind AI-powered legal document analysis platform. Other models in the family:

Model	Task	Language
lexglue-roberta-unfair-tos	Terms of Service analysis	English
contractnli-distilbert-nda	NDA provision detection	English
banking-bert-turkish	Banking contract analysis	Turkish
en-banking-roberta (this model)	Banking contract analysis	English

License

MIT

Downloads last month: 57

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Agreemind/en-banking-roberta

Base model

FacebookAI/roberta-base

Finetuned

(2276)

this model

Evaluation results

Micro-F1 (held-out test)
self-reported

0.808
Micro-F1 (5-fold CV mean)
self-reported

0.805