---
library_name: transformers
datasets:
- coastalchp/ledgar
language:
- en
base_model:
- nlpaueb/legal-bert-base-uncased
pipeline_tag: text-classification

---
#  LegalBERT Fine-Tuned on LEDGAR Dataset

This model is a fine-tuned version of **[LegalBERT](https://huggingface.co/nlpaueb/legal-bert-base-uncased)** on the **LEDGAR** dataset for **legal clause classification**.  
It classifies legal clauses into one of **100 clause types** (e.g., confidentiality, termination, liability, etc.).

---

##  Model Overview

- **Base Model:** `nlpaueb/legal-bert-base-uncased`  
- **Task:** Multi-class clause classification  
- **Dataset:** LEDGAR  
- **Language:** English  
- **Number of labels:** 100  
- **Fine-tuning epochs:** 4  
- **Batch size:** 32  
- **Optimizer:** AdamW  
- **Mixed Precision (FP16):** Enabled (when CUDA available) 

---

##  Dataset Details

| Split | Samples | Description |
|-------|----------|-------------|
| Train | 60,000 | Used for model fine-tuning |
| Eval  | 10,000 | Used for validation during training |
| Test  | 10,000 | Held-out test set for final evaluation |

- **Total samples:** 80,000  
- **Number of labels:** 100  
- **Text column:** `text` (contains the clause text)  
- **Label column:** `label`  

---


##  Evaluation Results (on Test Set)

| Metric | Score |
|---------|--------|
| **Accuracy** | 0.8678 |
| **Macro F1** | 0.7779 |
| **Macro Precision** | 0.7917 |
| **Macro Recall** | 0.7763 |
| **Evaluation Time** | 38.37 sec |

---

##  How to Use

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model and tokenizer
model_name = "FENTECH/Legal-BERT-Clause-Classification"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example inference
text = "The contractor shall maintain confidentiality of all client information."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

predicted_label = outputs.logits.argmax(dim=-1).item()
print("Predicted label ID:", predicted_label)
```