---
language:
- en
- multilingual
library_name: gliner
pipeline_tag: token-classification
tags:
- gliner
- named-entity-recognition
- information-extraction
- legal
- contracts
- multilingual
- nlp
dataset:
- lucasorrentino/ContractNER
base_model: urchade/gliner_multi-v2.1
license: apache-2.0
widget:
- text: "This AGREEMENT is made effective as of January 1, 2026 by and between Agile Lab S.r.l., a company organized under the laws of Italy, and John Doe, serving as Chief Executive Officer. The Executive shall receive an annual base salary of $225,000.00."
  example_title: "Contract Parties & Salary"
---

# GLiNER ContractNER Multi - Fine-Grained Legal Entity Extraction

Developed at [**Agile Lab**](https://www.agilelab.it/) by **Luca Sorrentino** and **Annalisa Belia**, with **Irene Donato** as project lead.

---

**Model Name:** `gliner-contractner-multi-v2.1` (Agile Lab Fine-tune)
**Base Architecture:** GLiNER Multi v2.1 (Backbone: `microsoft/mdeberta-v3-base`)
**Dataset:** [lucasorrentino/ContractNER](https://huggingface.co/datasets/lucasorrentino/ContractNER)

## Model Description

**GLiNER ContractNER Multi** is a multilingual span-based Named Entity Recognition (NER) model fine-tuned by **Agile Lab** on the **ContractNER** dataset. It is designed to extract fine-grained entities from legal contracts with high precision.

Built on the **GLiNER Multi v2.1** architecture, this model achieves **72.49% micro F1** on the held-out test set, significantly outperforming general-purpose LLMs and domain-specific legal models in our benchmarks.

### Key Highlights
*   **Contract-Specialized:** Fine-tuned on **3,240+ annotated contract chunks** from SEC EDGAR filings ([lucasorrentino/ContractNER](https://huggingface.co/datasets/lucasorrentino/ContractNER)).
*   **Granular Extraction:** Capable of identifying 18 specific entity types including parties, dates, financial terms (salaries, shares), and regulatory references.
*   **Open-Vocabulary NER:** Supports promptable entity extraction—you can provide custom label names at inference time without retraining.
*   **Multilingual Capability:** Inherits multilingual behavior from GLiNER Multi v2.1 and mDeBERTa-v3-base, though optimized primarily for English contracts (performance may degrade on low-resource languages).
*   **Production-Ready:** A recommended threshold of **0.8–0.9** balances high precision with acceptable recall, minimizing costly false positives in legal review workflows.

---

## 🚀 How to Use

To use this model, you need to install the `gliner` library.

### Installation
```bash
pip install gliner
```

### Inference Code
```python
from gliner import GLiNER

model = GLiNER.from_pretrained("lucasorrentino/Contractner")

# Example contract text
text = """
This EMPLOYMENT AGREEMENT is made effective as of January 1, 2026,
by and between Tech Solutions Inc. ("Company") and Jane Doe ("Executive").
The Executive shall serve as Chief Technology Officer.
The Company agrees to pay the Executive an annual base salary of $250,000.00.
"""

# Define the entities you want to extract (Open Vocabulary)
labels = [
    "Parties", "EffectiveDate", "Role", "Salary", "TerminationDate"
]

# Predict
entities = model.predict_entities(text, labels, threshold=0.5)

# Print results
for entity in entities:
    print(f"{entity['text']} => {entity['label']} (Score: {entity['score']:.2f})")
```

---

## 📊 Evaluation & Benchmarks

Evaluated on the held-out test set (158 samples, 20% stratified split, not seen during training).
Reproduce with: `uv run eval.py` from the model repo.

**Methodology:** `overlap_cover` matching — a prediction counts as TP if it covers 100% of the gold span (boundary tolerance ±1 char), greedy 1-to-1 matching, micro-averaged P/R/F1.

### Overall Results (threshold = 0.9)

| Metric | Score |
| :--- | ---: |
| **Micro F1** | **72.49%** |
| Micro Precision | 83.13% |
| Micro Recall | 64.27% |
| Macro F1 | 72.94% |

![Threshold sweep](threshold_sweep.png)

### Performance vs. Other Approaches

| Approach | F1 |
| :--- | ---: |
| **GLiNER ContractNER (this model)** | **72.49%** |
| Standalone DeBERTa models | 46–78% |
| General-purpose LLMs (Qwen, Gemma) | < 35% |
| Legal-specific models (LegalBERT, ContractBERT) | < 10% |

### Per-Entity Results (test set, threshold = 0.9)

| Entity | Precision | Recall | F1 | Support |
| :--- | ---: | ---: | ---: | ---: |
| **PII_Ref** | 94.12% | 94.12% | **94.12%** | 17 |
| **EffectiveDate** | 95.83% | 88.46% | **92.00%** | 26 |
| **Salary** | 89.47% | 94.44% | **91.89%** | 18 |
| **Percentage** | 91.30% | 87.50% | **89.36%** | 24 |
| **Role** | 96.67% | 82.86% | **89.23%** | 35 |
| **Parties** | 85.71% | 87.10% | 86.40% | 62 |
| **Shares** | 100.00% | 73.68% | 84.85% | 19 |
| **Price** | 92.31% | 75.00% | 82.76% | 16 |
| **Court** | 83.33% | 75.00% | 78.95% | 20 |
| **RenewalTerm** | 66.67% | 83.33% | 74.07% | 12 |
| **Principal** | 82.61% | 61.29% | 70.37% | 31 |
| **Title** | 82.46% | 60.26% | 69.63% | 78 |
| **Address** | 62.96% | 77.27% | 69.39% | 22 |
| **TerminationDate** | 71.43% | 55.56% | 62.50% | 18 |
| **Rent** | 66.67% | 50.00% | 57.14% | 8 |
| **Ratio** | 85.71% | 35.29% | 50.00% | 17 |
| **Act** | 85.00% | 25.00% | 38.64% | 68 |
| **Regulation** | 47.37% | 23.68% | 31.58% | 38 |

> **Note on Act and Regulation:** low recall is due to truncation — these entities often appear in long legal clauses that exceed the model's 384-token context window. Use sliding window inference for full-document coverage.

![Per-entity metrics](per_entity_metrics.png)

---

## Supported Entity Schema

The model was trained on the **ContractNER** schema. While you can use custom labels, performance is best with categories semantically similar to:

### Document Metadata
*   **EffectiveDate**: Contract start date (e.g., "January 1, 2026").
*   **TerminationDate**: Contract end or expiration date.
*   **RenewalTerm**: Renewal periods or conditions.
*   **Title**: Official document title.

### Actors & Roles
*   **Parties**: Legal entities entering the agreement (companies, individuals).
*   **Role**: Professional titles and positions (e.g., "Chief Executive Officer").

### Contact Information
*   **Address**: Physical addresses.
*   **PII_Ref**: Personal identifiable information references (phone, email, fax).

### Financial Values
*   **Salary**: Compensation amounts (always with currency symbol, e.g., "$225,000.00").
*   **Price**: Goods/services prices.
*   **Principal**: Loan principal amounts.
*   **Shares**: Stock or equity quantities.
*   **Percentage**: Percentage values (e.g., "50%").
*   **Ratio**: Financial ratios.
*   **Rent**: Lease or rental amounts.

### Legal and Regulatory
*   **Court**: Judicial bodies and tribunals (e.g., "State of Texas").
*   **Act**: Legislative acts and laws.
*   **Regulation**: Regulatory references (e.g., "Rule 10b5-1").

---

## Training Details

### Data Source & Preprocessing
*   **Dataset:** [lucasorrentino/ContractNER](https://huggingface.co/datasets/lucasorrentino/ContractNER) — Real contracts from SEC EDGAR (U.S. Securities and Exchange Commission filings), based on Adibhatla et al. (2023).
*   **Original Size:** ~5,000+ annotated contract segments.
*   **Consolidated Dataset:** ~3,240 chunks after stratified reduction and class consolidation.
*   **Adjustments:**
    *   Removed `RevolvingCredit` class (too rare and ambiguous).
    *   Rebalanced dataset to ensure minimum representation per class.
    *   **Split:** 80% training / 20% validation (random split).
    *   **Methodology:** Human-in-the-loop iterative labeling.

### Architecture & Configuration
*   **Base Model:** GLiNER Multi v2.1 (209M parameters).
*   **Encoder Backbone:** `microsoft/mdeberta-v3-base` (86M backbone + 190M embedding parameters).
*   **Architecture Type:** Span-based NER with entity-query matching.
*   **Hardware:** NVIDIA L4 GPU.
*   **Training Time:** ~30 minutes per fine-tuning run.

---

## 🏢 Real-World Use Cases

### Document Due Diligence
When a law firm or investment fund needs to analyze large volumes of contracts ahead of an acquisition or investment, this model accelerates the extraction of key information—parties involved, relevant dates, renewal terms, and financial values—reducing manual pre-screening effort and narrowing the scope of documents that require deep legal review.

**Relevant entities:** `Parties`, `EffectiveDate`, `TerminationDate`, `RenewalTerm`, `Salary`, `Price`, `Principal`

### Contract Deadline Monitoring
By extracting `EffectiveDate`, `TerminationDate`, and `RenewalTerm` across a contract portfolio, organizations can power alert systems that proactively surface upcoming expirations, auto-renewals, and notice windows—improving operational control over contractual obligations at scale.

**Relevant entities:** `EffectiveDate`, `TerminationDate`, `RenewalTerm`

### Compliance Audit Support
When a legal team needs to assess the impact of new regulatory requirements or internal policy changes across active contracts, the model helps quickly identify documents containing regulatory references, specific parties, and relevant dates—narrowing the audit perimeter to contracts that actually warrant deeper review.

**Relevant entities:** `Act`, `Regulation`, `Court`, `Parties`, `EffectiveDate`

---

## ⚡ Latency & Deployment Cost

### Hardware Requirements
*   **No GPU required** — the model runs entirely on CPU.
*   **Memory footprint:** ~2.2 GB RAM (PyTorch).
*   **ONNX export** (`model.onnx`) is included in the repository.

### Latency Benchmark

**Hardware:** MacBook Pro — Apple M1 Pro (8-core, 6P+2E), 16 GB RAM  
**Runs:** 20 (after 3 warmup) · **Threshold:** 0.9

| Backend | Input length | Median | p95 | Throughput |
| :--- | :--- | ---: | ---: | ---: |
| PyTorch CPU | Short (~300 chars) | 134 ms | 148 ms | ~7.4 docs/s |
| PyTorch CPU | Medium (~800 chars) | 202 ms | 216 ms | ~5.0 docs/s |
| PyTorch CPU | Full chunk (~1,000 chars) | 261 ms | 275 ms | ~3.8 docs/s |
| ONNX Runtime CPU | Full chunk (~1,000 chars) | 308 ms | — | ~3.2 docs/s |

> ONNX Runtime on Apple M1 is slightly slower than PyTorch due to Apple Silicon optimizations in PyTorch's ARM kernels. On x86 Linux (e.g., AWS c5), ONNX typically delivers 1.5–2× speedup over PyTorch CPU.

> For documents longer than 384 tokens, use sliding window inference to avoid truncation.

### Cost vs. LLM-Based Extraction

| Approach | F1 on ContractNER | Cost / 1,000 docs |
| :--- | :--- | ---: |
| **This model (CPU, self-hosted)** | **72.49%** | **≈$0.014** |
| GPT-4o-mini | < 35% | ≈$0.07 |
| GPT-4o | < 35% | ≈$0.33 |

> Self-hosted cost assumes AWS c5.xlarge on-demand ($0.17/hr, ≈3.8 docs/s, full utilization). LLM costs estimated at list pricing (≈250 input + ≈50 output tokens per chunk).

---

## License
Apache 2.0