amicus-ner-v2 / README.md
WhiteRoomProdigy's picture
Add amicus-ner-v2 model card
3c7a413 verified
|
Raw
History Blame Contribute Delete
3.8 kB
---
license: apache-2.0
language:
- en
tags:
- token-classification
- ner
- legal
- legal-bert
- nigerian-law
- lora
- peft
- onnx
base_model: WhiteRoomProdigy/amicus-ner-v1
pipeline_tag: token-classification
library_name: transformers
metrics:
- precision
- recall
- f1
---
# Amicus NER v2 - Nigerian Legal Named Entity Recognition
**amicus-ner-v2** is a production-ready Named Entity Recognition model for **Nigerian legal text**.
It is a LoRA fine-tuned version of [WhiteRoomProdigy/amicus-ner-v1](https://huggingface.co/WhiteRoomProdigy/amicus-ner-v1),
which is based on `nlpaueb/legal-bert-base-uncased`.
This model identifies **8 legal entity types** in Nigerian court judgements, briefs, and legal documents.
---
## Entity Labels
| Label | Description | Example |
|---|---|---|
| `CASE_NAME` | Party names in litigation | *Amusa v. INEC* |
| `CITATION` | Law report references (NWLR, LPELR, SCNJ, FWLR) | *(2023) 14 NWLR (Pt.637) 70* |
| `STATUTE` | Legislation, sections, constitutional provisions | *Section 137(1)(b) of CFRN 1999* |
| `COURT` | Nigerian courts and tribunals | *Supreme Court of Nigeria* |
| `DATE` | Judgment and filing dates | *15th March 2022* |
| `JUDGE` | Judicial officers with designations | *Justice Bello JSC* |
| `RATIO` | Ratio decidendi passages | - |
| `HELD` | Court holding / decision text | - |
---
## What's New in v2
| Improvement | v1 | v2 |
|---|---|---|
| Training method | Full fine-tune | LoRA (r=16, ~0.8% params trained) |
| Class imbalance | Untreated | Weighted CrossEntropy (O-weight = 0.05) |
| Training data | Base legal-bert weights | Distant supervision + 600 synthetic examples |
| Synthetic data | None | 600 Gemini-generated entity-rich sentences |
| Export | PyTorch only | PyTorch + ONNX INT8 quantized |
| Inference speed | Baseline | ~3-4x faster (ONNX INT8 on CPU) |
---
## Model Details
| Property | Value |
|---|---|
| **Architecture** | BERT-base (nlpaueb/legal-bert-base-uncased) |
| **Fine-tuning method** | PEFT LoRA - rank 16, alpha 32 |
| **Target modules** | `query`, `value` (attention projection layers) |
| **Training epochs** | 8 |
| **Batch size** | 16 |
| **Learning rate** | 3e-4 |
| **Loss function** | Weighted CrossEntropyLoss (entity = 1.0, O = 0.05) |
| **Dataset** | Distant supervision from LawPavilion + Legalpedia + 600 synthetic examples |
| **Labels** | 17 (O + B/I for each of 8 entity types) |
| **Max sequence length** | 512 tokens |
---
## How to Use
```python
from transformers import pipeline
ner = pipeline(
"token-classification",
model="WhiteRoomProdigy/amicus-ner-v2",
aggregation_strategy="simple"
)
text = "As held in Amusa v. INEC (2023) 14 NWLR (Pt.637) 70, the Supreme Court found no merit."
results = ner(text)
for entity in results:
print(entity['entity_group'], '|', entity['score'], '|', entity['word'])
```
---
## Training Data
Trained on a combination of:
1. **Distant supervision** from LawPavilion and Legalpedia Nigerian judgment databases,
auto-annotated using a hand-crafted regex engine (NWLR/LPELR citation patterns,
court name patterns, judge designation patterns)
2. **Synthetic augmentation** - 600 entity-rich sentences covering all 8 entity types
All training data is derived from publicly available Nigerian court judgements.
---
## Citation
```bibtex
@misc{amicus-ner-v2,
title = {amicus-ner-v2: Nigerian Legal Named Entity Recognition},
author = {WhiteRoomProdigy},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/WhiteRoomProdigy/amicus-ner-v2}},
note = {LoRA fine-tune of amicus-ner-v1 for Nigerian legal NER}
}
```
---
## License
Apache 2.0. Built by the [Dockase](https://dockase.com) team for the Nigerian legal technology ecosystem.