yasminekaced's picture
Update README.md
a0e05f4 verified
---
library_name: transformers
datasets:
- coastalchp/ledgar
language:
- en
base_model:
- nlpaueb/legal-bert-base-uncased
pipeline_tag: text-classification
---
# LegalBERT Fine-Tuned on LEDGAR Dataset
This model is a fine-tuned version of **[LegalBERT](https://huggingface.co/nlpaueb/legal-bert-base-uncased)** on the **LEDGAR** dataset for **legal clause classification**.
It classifies legal clauses into one of **100 clause types** (e.g., confidentiality, termination, liability, etc.).
---
## Model Overview
- **Base Model:** `nlpaueb/legal-bert-base-uncased`
- **Task:** Multi-class clause classification
- **Dataset:** LEDGAR
- **Language:** English
- **Number of labels:** 100
- **Fine-tuning epochs:** 4
- **Batch size:** 32
- **Optimizer:** AdamW
- **Mixed Precision (FP16):** Enabled (when CUDA available)
---
## Dataset Details
| Split | Samples | Description |
|-------|----------|-------------|
| Train | 60,000 | Used for model fine-tuning |
| Eval | 10,000 | Used for validation during training |
| Test | 10,000 | Held-out test set for final evaluation |
- **Total samples:** 80,000
- **Number of labels:** 100
- **Text column:** `text` (contains the clause text)
- **Label column:** `label`
---
## Evaluation Results (on Test Set)
| Metric | Score |
|---------|--------|
| **Accuracy** | 0.8678 |
| **Macro F1** | 0.7779 |
| **Macro Precision** | 0.7917 |
| **Macro Recall** | 0.7763 |
| **Evaluation Time** | 38.37 sec |
---
## How to Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load model and tokenizer
model_name = "FENTECH/Legal-BERT-Clause-Classification"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Example inference
text = "The contractor shall maintain confidentiality of all client information."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
predicted_label = outputs.logits.argmax(dim=-1).item()
print("Predicted label ID:", predicted_label)
```