Instructions to use pii-engineer/PII-Engineer-Multi-NER-v2.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER
How to use pii-engineer/PII-Engineer-Multi-NER-v2.1 with GLiNER:
from gliner import GLiNER model = GLiNER.from_pretrained("pii-engineer/PII-Engineer-Multi-NER-v2.1") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,139 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: onnxruntime
|
| 3 |
+
tags:
|
| 4 |
+
- ner
|
| 5 |
+
- pii
|
| 6 |
+
- gliner
|
| 7 |
+
- onnx
|
| 8 |
+
- privacy
|
| 9 |
+
- pdpa
|
| 10 |
+
- pdpd
|
| 11 |
+
- pipl
|
| 12 |
+
- multilingual
|
| 13 |
+
language:
|
| 14 |
+
- en
|
| 15 |
+
- ms
|
| 16 |
+
- ta
|
| 17 |
+
- zh
|
| 18 |
+
- id
|
| 19 |
+
- vi
|
| 20 |
+
- th
|
| 21 |
+
- hi
|
| 22 |
+
- bn
|
| 23 |
+
- ko
|
| 24 |
+
- de
|
| 25 |
+
- fr
|
| 26 |
+
- ru
|
| 27 |
+
license: agpl-3.0
|
| 28 |
+
pipeline_tag: token-classification
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
# PII-Engineer-Multi-NER-v2.1
|
| 32 |
+
|
| 33 |
+
A high-accuracy, multilingual PII detection model for privacy compliance (PDPA, PDPD, PDP Law, PIPL).
|
| 34 |
+
|
| 35 |
+
Fine-tuned with LoRA on GLiNER2 architecture (mDeBERTa-v3-base encoder, 280M params). Optimized for real-world PII detection across 13+ languages.
|
| 36 |
+
|
| 37 |
+
**Built by [PII Engineer](https://pii.engineer)**
|
| 38 |
+
|
| 39 |
+
## Labels (9 PII types)
|
| 40 |
+
|
| 41 |
+
| Label | Description |
|
| 42 |
+
|-------|-------------|
|
| 43 |
+
| `person_name` | Full names, partial names |
|
| 44 |
+
| `phone_number` | Phone/mobile numbers (international) |
|
| 45 |
+
| `government_id` | NRIC, SSN, Aadhaar, NIK, CCCD, etc. |
|
| 46 |
+
| `street_address` | Physical addresses |
|
| 47 |
+
| `date_of_birth` | Birth dates in any format |
|
| 48 |
+
| `email_address` | Email addresses |
|
| 49 |
+
| `passport_number` | Passport numbers (multi-country) |
|
| 50 |
+
| `license_plate` | Vehicle license plates |
|
| 51 |
+
| `bank_account_number` | Bank account/routing numbers |
|
| 52 |
+
|
| 53 |
+
## Performance
|
| 54 |
+
|
| 55 |
+
| Label | Precision | Recall | F1 |
|
| 56 |
+
|-------|-----------|--------|----|
|
| 57 |
+
| person_name | 0.808 | 0.838 | 0.823 |
|
| 58 |
+
| phone_number | 0.962 | 0.975 | 0.968 |
|
| 59 |
+
| government_id | 0.902 | 0.938 | 0.920 |
|
| 60 |
+
| street_address | 0.903 | 0.891 | 0.897 |
|
| 61 |
+
| date_of_birth | 0.901 | 0.901 | 0.901 |
|
| 62 |
+
| email_address | 0.974 | 0.966 | 0.970 |
|
| 63 |
+
| passport_number | 0.808 | 0.812 | 0.810 |
|
| 64 |
+
| license_plate | 0.837 | 0.847 | 0.842 |
|
| 65 |
+
| bank_account_number | 0.879 | 0.906 | 0.892 |
|
| 66 |
+
| **Mean** | | | **0.902** |
|
| 67 |
+
|
| 68 |
+
## Architecture
|
| 69 |
+
|
| 70 |
+
- **Encoder:** mDeBERTa-v3-base (768 hidden, 12 layers, 12 heads)
|
| 71 |
+
- **Framework:** GLiNER2 span-based NER (5 ONNX models)
|
| 72 |
+
- **Parameters:** ~280M
|
| 73 |
+
- **Inference:** ONNX Runtime (CPU or GPU)
|
| 74 |
+
|
| 75 |
+
### ONNX Models
|
| 76 |
+
|
| 77 |
+
| File | Size | Description |
|
| 78 |
+
|------|------|-------------|
|
| 79 |
+
| encoder.onnx | 1.1GB | Token encoder (FP32) |
|
| 80 |
+
| encoder_int8.onnx | 511MB | Token encoder (INT8 quantized) |
|
| 81 |
+
| span_rep.onnx | 63MB | Span representation |
|
| 82 |
+
| count_embed.onnx | 41MB | Count embedding |
|
| 83 |
+
| count_pred.onnx | 4.6MB | Count prediction |
|
| 84 |
+
| classifier.onnx | 4.5MB | Classification head |
|
| 85 |
+
|
| 86 |
+
## Quick Start
|
| 87 |
+
|
| 88 |
+
Use with [pii.engineer](https://github.com/gantz-ai/pii.engineer) (Rust server with auto-download):
|
| 89 |
+
|
| 90 |
+
```bash
|
| 91 |
+
cargo build --release --package pii-engineer-server
|
| 92 |
+
cargo run --release --package pii-engineer-server
|
| 93 |
+
# Models download automatically on first run
|
| 94 |
+
```
|
| 95 |
+
|
| 96 |
+
Or download manually:
|
| 97 |
+
|
| 98 |
+
```bash
|
| 99 |
+
huggingface-cli download pii-engineer/PII-Engineer-Multi-NER-v2.1 --local-dir models/ner-v21
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
## Supported Languages
|
| 103 |
+
|
| 104 |
+
**Primary:** English, Malay, Tamil, Chinese, Indonesian, Vietnamese
|
| 105 |
+
|
| 106 |
+
**Secondary:** Thai, Hindi, Bengali, Korean, German, French, Russian
|
| 107 |
+
|
| 108 |
+
## Use Cases
|
| 109 |
+
|
| 110 |
+
- PDPA (Singapore) compliance scanning
|
| 111 |
+
- PDPD (Vietnam) compliance scanning
|
| 112 |
+
- PDP Law (Indonesia) compliance scanning
|
| 113 |
+
- PIPL (China) compliance scanning
|
| 114 |
+
- PII detection in documents, chat logs, databases
|
| 115 |
+
- Pre-processing for data anonymization pipelines
|
| 116 |
+
|
| 117 |
+
## Limitations
|
| 118 |
+
|
| 119 |
+
- Optimized for structured/semi-structured text (forms, emails, documents)
|
| 120 |
+
- May underperform on highly informal social media text
|
| 121 |
+
- Date-of-birth detection requires contextual birth cues (e.g., "born", "DOB", "lahir")
|
| 122 |
+
|
| 123 |
+
## Citation
|
| 124 |
+
|
| 125 |
+
```bibtex
|
| 126 |
+
@misc{pii-engineer-multi-ner-v2.1,
|
| 127 |
+
title={PII-Engineer-Multi-NER-v2.1: Multilingual PII Detection Model},
|
| 128 |
+
author={PII Engineer},
|
| 129 |
+
year={2026},
|
| 130 |
+
url={https://pii.engineer},
|
| 131 |
+
note={Fine-tuned on GLiNER2 architecture with LoRA}
|
| 132 |
+
}
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
## License
|
| 136 |
+
|
| 137 |
+
AGPL-3.0 — free for open-source use. Commercial license available at [pii.engineer](https://pii.engineer).
|
| 138 |
+
|
| 139 |
+
Built on [gliner2-multi-v1](https://huggingface.co/fastino/gliner2-multi-v1) (Apache 2.0) and [mDeBERTa-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base) (MIT).
|