| --- |
| language: en |
| license: mit |
| tags: |
| - ner |
| - legal-nlp |
| - token-classification |
| - bert |
| --- |
| |
| # legal_contract_named_entity_recognizer |
|
|
| ## Overview |
| This model is a BERT-based Token Classifier fine-tuned for the Legal domain. It automatically extracts key entities from commercial contracts, including the parties involved, effective dates, governing jurisdictions, and financial amounts. |
|
|
| ## Model Architecture |
| The model uses a **BERT-Large** backbone with a token-level classification head. |
|
|
|
|
|
|
| - **Tagging Scheme:** Follows the BIO (Beginning, Inside, Outside) format. |
| - **Contextual Embeddings:** Captures the dense semantic relationships between legal definitions (e.g., distinguishing between a "Notice Date" and an "Effective Date"). |
| - **Fine-tuning:** Trained on the CUAD (Contract Understanding Atticus Dataset) and proprietary legal corpora. |
|
|
| ## Intended Use |
| - **Contract Lifecycle Management (CLM):** Automating the extraction of metadata for digital repositories. |
| - **Due Diligence:** Rapidly identifying governing laws and liability amounts across thousands of merger documents. |
| - **Regulatory Compliance:** Checking for the presence of specific mandatory parties or dates in financial agreements. |
|
|
| ## Limitations |
| - **Legalese Variation:** Older or highly non-standard contract formats may result in lower entity recall. |
| - **Nested Entities:** Does not support hierarchical or overlapping entities (e.g., an "Amount" inside a "Payment Clause"). |
| - **OCR Errors:** Performance is highly dependent on the quality of the text; poorly scanned PDFs with OCR noise will degrade accuracy. |