--- license: apache-2.0 language: - en tags: - token-classification - ner - legal - legal-bert - nigerian-law - lora - peft - onnx base_model: WhiteRoomProdigy/amicus-ner-v1 pipeline_tag: token-classification library_name: transformers metrics: - precision - recall - f1 --- # Amicus NER v2 - Nigerian Legal Named Entity Recognition **amicus-ner-v2** is a production-ready Named Entity Recognition model for **Nigerian legal text**. It is a LoRA fine-tuned version of [WhiteRoomProdigy/amicus-ner-v1](https://huggingface.co/WhiteRoomProdigy/amicus-ner-v1), which is based on `nlpaueb/legal-bert-base-uncased`. This model identifies **8 legal entity types** in Nigerian court judgements, briefs, and legal documents. --- ## Entity Labels | Label | Description | Example | |---|---|---| | `CASE_NAME` | Party names in litigation | *Amusa v. INEC* | | `CITATION` | Law report references (NWLR, LPELR, SCNJ, FWLR) | *(2023) 14 NWLR (Pt.637) 70* | | `STATUTE` | Legislation, sections, constitutional provisions | *Section 137(1)(b) of CFRN 1999* | | `COURT` | Nigerian courts and tribunals | *Supreme Court of Nigeria* | | `DATE` | Judgment and filing dates | *15th March 2022* | | `JUDGE` | Judicial officers with designations | *Justice Bello JSC* | | `RATIO` | Ratio decidendi passages | - | | `HELD` | Court holding / decision text | - | --- ## What's New in v2 | Improvement | v1 | v2 | |---|---|---| | Training method | Full fine-tune | LoRA (r=16, ~0.8% params trained) | | Class imbalance | Untreated | Weighted CrossEntropy (O-weight = 0.05) | | Training data | Base legal-bert weights | Distant supervision + 600 synthetic examples | | Synthetic data | None | 600 Gemini-generated entity-rich sentences | | Export | PyTorch only | PyTorch + ONNX INT8 quantized | | Inference speed | Baseline | ~3-4x faster (ONNX INT8 on CPU) | --- ## Model Details | Property | Value | |---|---| | **Architecture** | BERT-base (nlpaueb/legal-bert-base-uncased) | | **Fine-tuning method** | PEFT LoRA - rank 16, alpha 32 | | **Target modules** | `query`, `value` (attention projection layers) | | **Training epochs** | 8 | | **Batch size** | 16 | | **Learning rate** | 3e-4 | | **Loss function** | Weighted CrossEntropyLoss (entity = 1.0, O = 0.05) | | **Dataset** | Distant supervision from LawPavilion + Legalpedia + 600 synthetic examples | | **Labels** | 17 (O + B/I for each of 8 entity types) | | **Max sequence length** | 512 tokens | --- ## How to Use ```python from transformers import pipeline ner = pipeline( "token-classification", model="WhiteRoomProdigy/amicus-ner-v2", aggregation_strategy="simple" ) text = "As held in Amusa v. INEC (2023) 14 NWLR (Pt.637) 70, the Supreme Court found no merit." results = ner(text) for entity in results: print(entity['entity_group'], '|', entity['score'], '|', entity['word']) ``` --- ## Training Data Trained on a combination of: 1. **Distant supervision** from LawPavilion and Legalpedia Nigerian judgment databases, auto-annotated using a hand-crafted regex engine (NWLR/LPELR citation patterns, court name patterns, judge designation patterns) 2. **Synthetic augmentation** - 600 entity-rich sentences covering all 8 entity types All training data is derived from publicly available Nigerian court judgements. --- ## Citation ```bibtex @misc{amicus-ner-v2, title = {amicus-ner-v2: Nigerian Legal Named Entity Recognition}, author = {WhiteRoomProdigy}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/WhiteRoomProdigy/amicus-ner-v2}}, note = {LoRA fine-tune of amicus-ner-v1 for Nigerian legal NER} } ``` --- ## License Apache 2.0. Built by the [Dockase](https://dockase.com) team for the Nigerian legal technology ecosystem.