Spaces:

Agreemind
/

README

Running

File size: 3,589 Bytes

1286a8a
ddd04d0
 
 
 
1286a8a
 
 
17a517e
1286a8a
17a517e
ddd04d0
17a517e
ddd04d0
17a517e
ddd04d0
17a517e
ddd04d0
17a517e
 
 
 
 
 
ddd04d0
17a517e
ddd04d0
17a517e
ddd04d0
 
 
 
 
17a517e
ddd04d0
 
 
 
17a517e
 
 
 
ddd04d0
 
17a517e
 
ddd04d0
 
17a517e
ddd04d0
17a517e
 
 
ff0784c
ddd04d0
17a517e
ddd04d0
17a517e
 
 
 
 
 
 
 
 
 
ddd04d0
17a517e
ddd04d0
17a517e
 
 
 
 
 
ddd04d0
17a517e
ddd04d0
17a517e
 
ddd04d0

---
title: Agreemind
emoji: ⚖️
colorFrom: blue
colorTo: purple
sdk: static
pinned: false
---
# Agreemind

**AI-powered legal risk detection for Terms of Service.**

We build models that automatically classify unfair clauses in Terms of Service documents, helping legal teams and consumers identify potentially harmful terms.

## Models

All models are fine-tuned on the [LexGLUE UNFAIR-ToS](https://huggingface.co/datasets/coastalcph/lex_glue) benchmark and evaluated on the official test set (1,607 samples) using the paper's methodology ([Chalkidis et al., 2022](https://arxiv.org/abs/2110.00976)).

| Model | μ-F1 | m-F1 | Speed | Best for |
|-------|------|------|-------|----------|
| [lexglue-roberta-unfair-tos](https://huggingface.co/Agreemind/lexglue-roberta-unfair-tos) | **96.1** | **84.4** | Normal | 🥇 Best accuracy |
| [lexglue-legalbert-unfair-tos](https://huggingface.co/Agreemind/lexglue-legalbert-unfair-tos) | **96.0** | **84.1** | Normal | 🥈 Legal domain |
| [lexglue-deberta-unfair-tos](https://huggingface.co/Agreemind/lexglue-deberta-unfair-tos) | 95.6 | 82.2 | Normal | General purpose |
| [lexglue-legalbert-small-unfair-tos](https://huggingface.co/Agreemind/lexglue-legalbert-small-unfair-tos) | 95.0 | 78.5 | ⚡ ~3x faster | Fast inference |

**LexGLUE Leaderboard comparison:** Legal-BERT (paper) = 96.0 μ-F1 / 83.0 m-F1. Our top models match or exceed this.

## Quick Start

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "Agreemind/lexglue-roberta-unfair-tos"  # Best model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

labels = [
    "Limitation of liability", "Unilateral termination",
    "Unilateral change", "Content removal",
    "Contract by using", "Choice of law",
    "Jurisdiction", "Arbitration",
]

text = "We may terminate your account at any time without notice."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

with torch.no_grad():
    probs = torch.sigmoid(model(**inputs).logits).squeeze()

for label, prob in sorted(zip(labels, probs), key=lambda x: x[1], reverse=True):
    if prob > 0.5:
        print(f"  {label}: {prob:.3f}")
```

## Risk Categories

| Category | Description |
|----------|-------------|
| Limitation of liability | Limits the provider's legal responsibility |
| Unilateral termination | Provider may terminate without clear cause |
| Unilateral change | Terms can change with minimal notice |
| Content removal | Provider may remove user content |
| Contract by using | Agreement implied by using the service |
| Choice of law | Specifies governing jurisdiction's law |
| Jurisdiction | Specifies where disputes are handled |
| Arbitration | Requires arbitration instead of court |

## Training Methodology

- **Dataset:** LexGLUE UNFAIR-ToS (standard split, no augmentation)
- **Loss:** Standard BCEWithLogitsLoss
- **LR:** 3e-5 with linear schedule
- **Batch size:** 8
- **Epochs:** Up to 20 with early stopping (patience=5)
- **Evaluation:** Official LexGLUE test set with paper's metric computation

## Links

- [LexGLUE Paper](https://arxiv.org/abs/2110.00976)
- [LexGLUE Dataset](https://huggingface.co/datasets/coastalcph/lex_glue)

* 🌐 Website: [https://agreemind.vercel.app](https://agreemind.vercel.app)
* 💻 GitHub: [https://github.com/agree-mind](https://github.com/agree-mind)

---

## 📄 License

Models and code are released under the **MIT License**, unless otherwise stated in individual repositories/models.