HADRETNA
/

Legal-BERT-Clause-Classification

Text Classification

text-embeddings-inference

Model card Files Files and versions

Legal-BERT-Clause-Classification / README.md

yasminekaced's picture

Update README.md

a0e05f4 verified 4 months ago

|

history blame contribute delete

2.09 kB

	---
	library_name: transformers
	datasets:
	- coastalchp/ledgar
	language:
	- en
	base_model:
	- nlpaueb/legal-bert-base-uncased
	pipeline_tag: text-classification

	---
	# LegalBERT Fine-Tuned on LEDGAR Dataset

	This model is a fine-tuned version of [LegalBERT](https://huggingface.co/nlpaueb/legal-bert-base-uncased) on the LEDGAR dataset for legal clause classification.
	It classifies legal clauses into one of 100 clause types (e.g., confidentiality, termination, liability, etc.).

	---

	## Model Overview

	- Base Model: `nlpaueb/legal-bert-base-uncased`
	- Task: Multi-class clause classification
	- Dataset: LEDGAR
	- Language: English
	- Number of labels: 100
	- Fine-tuning epochs: 4
	- Batch size: 32
	- Optimizer: AdamW
	- Mixed Precision (FP16): Enabled (when CUDA available)

	---

	## Dataset Details

	\| Split \| Samples \| Description \|
	\|-------\|----------\|-------------\|
	\| Train \| 60,000 \| Used for model fine-tuning \|
	\| Eval \| 10,000 \| Used for validation during training \|
	\| Test \| 10,000 \| Held-out test set for final evaluation \|

	- Total samples: 80,000
	- Number of labels: 100
	- Text column: `text` (contains the clause text)
	- Label column: `label`

	---


	## Evaluation Results (on Test Set)

	\| Metric \| Score \|
	\|---------\|--------\|
	\| Accuracy \| 0.8678 \|
	\| Macro F1 \| 0.7779 \|
	\| Macro Precision \| 0.7917 \|
	\| Macro Recall \| 0.7763 \|
	\| Evaluation Time \| 38.37 sec \|

	---

	## How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	# Load model and tokenizer
	model_name = "FENTECH/Legal-BERT-Clause-Classification"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# Example inference
	text = "The contractor shall maintain confidentiality of all client information."
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)

	predicted_label = outputs.logits.argmax(dim=-1).item()
	print("Predicted label ID:", predicted_label)
	```