Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: mit
|
| 4 |
+
tags:
|
| 5 |
+
- ner
|
| 6 |
+
- legal-nlp
|
| 7 |
+
- token-classification
|
| 8 |
+
- bert
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# legal_contract_named_entity_recognizer
|
| 12 |
+
|
| 13 |
+
## Overview
|
| 14 |
+
This model is a BERT-based Token Classifier fine-tuned for the Legal domain. It automatically extracts key entities from commercial contracts, including the parties involved, effective dates, governing jurisdictions, and financial amounts.
|
| 15 |
+
|
| 16 |
+
## Model Architecture
|
| 17 |
+
The model uses a **BERT-Large** backbone with a token-level classification head.
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
- **Tagging Scheme:** Follows the BIO (Beginning, Inside, Outside) format.
|
| 22 |
+
- **Contextual Embeddings:** Captures the dense semantic relationships between legal definitions (e.g., distinguishing between a "Notice Date" and an "Effective Date").
|
| 23 |
+
- **Fine-tuning:** Trained on the CUAD (Contract Understanding Atticus Dataset) and proprietary legal corpora.
|
| 24 |
+
|
| 25 |
+
## Intended Use
|
| 26 |
+
- **Contract Lifecycle Management (CLM):** Automating the extraction of metadata for digital repositories.
|
| 27 |
+
- **Due Diligence:** Rapidly identifying governing laws and liability amounts across thousands of merger documents.
|
| 28 |
+
- **Regulatory Compliance:** Checking for the presence of specific mandatory parties or dates in financial agreements.
|
| 29 |
+
|
| 30 |
+
## Limitations
|
| 31 |
+
- **Legalese Variation:** Older or highly non-standard contract formats may result in lower entity recall.
|
| 32 |
+
- **Nested Entities:** Does not support hierarchical or overlapping entities (e.g., an "Amount" inside a "Payment Clause").
|
| 33 |
+
- **OCR Errors:** Performance is highly dependent on the quality of the text; poorly scanned PDFs with OCR noise will degrade accuracy.
|