contractnli-distilbert-nda
DistilBERT fine-tuned on ContractNLI โ fastest inference, recommended for production
Task
Document-level NLI for Non-Disclosure Agreements (NDAs)
Given an NDA contract and a hypothesis about a standard provision, classify as:
- Entailment: The provision is present in the contract
- Contradiction: The provision is explicitly excluded
- NotMentioned: The contract does not address this provision
Performance
| Metric | Score |
|---|---|
| Micro-F1 | 86.0% |
| Macro-F1 | 76.2% |
| Parameters | 66M |
| Loss | Weighted Cross-Entropy |
Comparison
| Model | Micro-F1 | Macro-F1 | Params | Notes |
|---|---|---|---|---|
| Rule-based baseline | 20.9% | 16.0% | โ | Keyword matching |
| BERT-base (paper) | ~83% | โ | 110M | ContractNLI paper reference |
| contractnli-legalbert-nda-weighted | 87.3% | 79.3% | 110M | highest accuracy |
| contractnli-legalbert-nda-standard | 86.7% | 77.0% | 110M | |
| contractnli-bert-nda-standard | 86.9% | 76.7% | 110M | paper reproduction |
| contractnli-bert-nda-weighted | 86.3% | 77.9% | 110M | |
| contractnli-distilbert-nda | 86.0% | 76.2% | 66M | fastest inference, recommended for production โ this model |
17 NDA Provisions Checked
| ID | Provision |
|---|---|
| nda-1 | Explicit identification |
| nda-2 | Non-inclusion of non-technical information |
| nda-3 | Inclusion of verbally conveyed information |
| nda-4 | Limited use |
| nda-5 | Sharing with employees |
| nda-7 | Sharing with third-parties |
| nda-8 | Notice on compelled disclosure |
| nda-10 | Confidentiality of Agreement |
| nda-11 | No reverse engineering |
| nda-12 | Permissible development of similar information |
| nda-13 | Permissible acquirement of similar information |
| nda-15 | No licensing |
| nda-16 | Return of confidential information |
| nda-17 | Permissible copy |
| nda-18 | No solicitation |
| nda-19 | Survival of obligations |
| nda-20 | Permissible post-agreement possession |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "Agreemind/contractnli-distilbert-nda"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
hypothesis = "All Confidential Information shall be expressly identified by the Disclosing Party."
premise = "Section 2.1: Any information disclosed must be marked as Confidential..."
inputs = tokenizer(hypothesis, premise, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)
pred = ["Entailment", "Contradiction", "NotMentioned"][probs.argmax()]
print(f"Prediction: {pred} (confidence: {probs.max():.3f})")
Training Details
- Dataset: ContractNLI (607 NDAs, 17 hypotheses)
- Train/Dev/Test: 423/61/123 documents โ 33,974/5,131/9,373 span-level examples
- Base model:
distilbert-base-uncased - Loss: Weighted Cross-Entropy
- Learning rate: 3e-5 (aligned with ContractNLI paper)
- Epochs: 5 with early stopping (patience=3)
- Batch size: 8
- Max sequence length: 512
Citation
@inproceedings{koreeda-manning-2021-contractnli,
title = "ContractNLI: A Dataset for Document-level Natural Language Inference for Contracts",
author = "Koreeda, Yuta and Manning, Christopher",
booktitle = "Findings of EMNLP 2021",
year = "2021",
}
License
MIT
- Downloads last month
- 28
Model tree for Agreemind/contractnli-distilbert-nda
Base model
distilbert/distilbert-base-uncased