Agreemind English Banking Risk Classifier

RoBERTa-base fine-tuned for multi-label clause-level risk detection in English consumer banking contracts.

This model takes a single clause or paragraph from a banking agreement and predicts which risk categories (if any) it contains. It is designed for use in automated contract analysis pipelines where the system must flag potentially unfair or risky terms for consumer review.

Model Details

Property Value
Base Model roberta-base (125M parameters)
Task Multi-label clause-level classification
Language English
Domain Consumer banking contracts (credit cards, deposit accounts, prepaid cards, HELOC, online banking)
Training Data 88 US banking contracts, 4,334 clauses, from 17+ banks
Loss BCEWithLogitsLoss with per-label pos_weight
Max Length 256 tokens

Performance

Held-Out Evaluation (70/15/15 contract-level split)

Threshold micro-P micro-R micro-F1 macro-F1 Subset Acc Hamming Loss
Default 0.5 0.747 0.872 0.805 0.728 0.664 0.050
Global 0.70 0.797 0.820 0.808 0.713 0.682 0.046
Per-label 0.780 0.833 0.806 0.712 0.671 0.048

5-Fold Cross-Validation (contract-level folds)

Metric Mean ± Std
micro-F1 0.805 ± 0.014
macro-F1 0.728 ± 0.017
Subset Accuracy 0.655 ± 0.019
Hamming Loss 0.049 ± 0.004

Per-Label Results (5-fold CV, default 0.5 threshold)

Label F1 (mean ± std) Support
hidden_fees 0.873 ± 0.022 1,130
dispute_limitation 0.840 ± 0.013 948
account_freeze_or_closure 0.813 ± 0.024 943
data_sharing 0.800 ± 0.020 398
unilateral_terms_change 0.764 ± 0.050 362
overdraft_or_overlimit_penalty 0.749 ± 0.045 332
unilateral_rate_change 0.684 ± 0.038 199
auto_enrollment 0.515 ± 0.074 141
rewards_restriction_or_devaluation 0.513 ± 0.196 32

Baseline Comparison

Model micro-F1 macro-F1
TF-IDF + Logistic Regression 0.761 0.561
RoBERTa (this model) 0.808 0.728

Risk Categories

The model classifies clauses into 9 risk categories:

ID Label Description Severity
0 hidden_fees Undisclosed fees, penalty charges, or fee escalation clauses High
1 unilateral_rate_change Bank can change interest rates without customer consent High
2 unilateral_terms_change Bank can modify agreement terms unilaterally Medium
3 overdraft_or_overlimit_penalty Overdraft fees, over-limit charges, penalty APR triggers High
4 auto_enrollment Automatic enrollment in services requiring opt-out Medium
5 data_sharing Sharing personal/financial data with third parties Medium
6 dispute_limitation Restrictions on dispute resolution (arbitration, class action waivers) High
7 account_freeze_or_closure Bank can freeze, suspend, or close account Critical
8 rewards_restriction_or_devaluation Changes to rewards programs, point devaluation Low

A clause with no risk labels is considered fair (all-zero prediction).

Usage

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "Agreemind/en-banking-roberta"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

LABELS = [
    "hidden_fees", "unilateral_rate_change", "unilateral_terms_change",
    "overdraft_or_overlimit_penalty", "auto_enrollment", "data_sharing",
    "dispute_limitation", "account_freeze_or_closure",
    "rewards_restriction_or_devaluation",
]

text = "We may change the interest rate on your account at any time without prior notice."

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    probs = torch.sigmoid(model(**inputs).logits).squeeze()

# Using recommended global threshold of 0.70
threshold = 0.70
for label, prob in sorted(zip(LABELS, probs), key=lambda x: x[1], reverse=True):
    flag = " ⚠️" if prob >= threshold else ""
    print(f"  {prob:.3f}  {label}{flag}")

With Pipeline API

from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="Agreemind/en-banking-roberta",
    top_k=None,
    function_to_apply="sigmoid",
)

result = pipe("The bank may close your account at any time for any reason.")
# Filter with threshold
risky = [r for r in result[0] if r["score"] >= 0.70]
print(risky)

Recommended Thresholds

For production use, we recommend the global threshold of 0.70 which was tuned on the validation set and consistently outperformed both default 0.5 and per-label thresholds across 5-fold CV.

Optimized per-label thresholds are included in thresholds.json in this repository for advanced use cases.

Training Details

Parameter Value
Learning rate 2e-5
Epochs 6
Batch size 16 (train) / 32 (eval)
Max length 256 tokens
Weight decay 0.01
Warmup ratio 0.1
Early stopping patience=3 on validation micro-F1
Imbalance handling BCEWithLogitsLoss with pos_weight computed from training data
Data split 70% train / 15% dev / 15% test (contract-level, seed=42)
Fair clause balancing 30% target ratio in training split

Training Data

88 contracts from 17+ US banks including Chase, Bank of America, Wells Fargo, Citibank, US Bank, Capital One, Ally Bank, Truist, TD Bank, Regions, Citizens, HSBC USA, Santander, and others.

Contract types: credit card agreements, prepaid card agreements, deposit account agreements, online banking service agreements, HELOC disclosures, and treasury management agreements.

Limitations

  • Rare label weakness: auto_enrollment (141 examples) and rewards_restriction_or_devaluation (32 examples) have limited training data and show higher variance
  • US-centric: Trained exclusively on US banking contracts; may not generalize well to UK, EU, or other jurisdictions
  • English only: Does not support other languages
  • Clause-level only: Expects pre-segmented clauses/paragraphs, not full documents

Part of the Agreemind Platform

This model is part of the Agreemind AI-powered legal document analysis platform. Other models in the family:

Model Task Language
lexglue-roberta-unfair-tos Terms of Service analysis English
contractnli-distilbert-nda NDA provision detection English
banking-bert-turkish Banking contract analysis Turkish
en-banking-roberta (this model) Banking contract analysis English

License

MIT

Downloads last month
57
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Agreemind/en-banking-roberta

Finetuned
(2276)
this model

Evaluation results