Shoriful025
/

legal_contract_named_entity_recognizer

Token Classification

Model card Files Files and versions

Shoriful025 commited on Dec 26, 2025

Commit

0be6568

·

verified ·

1 Parent(s): 3f73c3c

Create README.md

Files changed (1) hide show

README.md +33 -0

README.md ADDED Viewed

	@@ -0,0 +1,33 @@

+---
+language: en
+license: mit
+tags:
+- ner
+- legal-nlp
+- token-classification
+- bert
+---
+# legal_contract_named_entity_recognizer
+## Overview
+This model is a BERT-based Token Classifier fine-tuned for the Legal domain. It automatically extracts key entities from commercial contracts, including the parties involved, effective dates, governing jurisdictions, and financial amounts.
+## Model Architecture
+The model uses a **BERT-Large** backbone with a token-level classification head.
+- **Tagging Scheme:** Follows the BIO (Beginning, Inside, Outside) format.
+- **Contextual Embeddings:** Captures the dense semantic relationships between legal definitions (e.g., distinguishing between a "Notice Date" and an "Effective Date").
+- **Fine-tuning:** Trained on the CUAD (Contract Understanding Atticus Dataset) and proprietary legal corpora.
+## Intended Use
+- **Contract Lifecycle Management (CLM):** Automating the extraction of metadata for digital repositories.
+- **Due Diligence:** Rapidly identifying governing laws and liability amounts across thousands of merger documents.
+- **Regulatory Compliance:** Checking for the presence of specific mandatory parties or dates in financial agreements.
+## Limitations
+- **Legalese Variation:** Older or highly non-standard contract formats may result in lower entity recall.
+- **Nested Entities:** Does not support hierarchical or overlapping entities (e.g., an "Amount" inside a "Payment Clause").
+- **OCR Errors:** Performance is highly dependent on the quality of the text; poorly scanned PDFs with OCR noise will degrade accuracy.