Instructions to use pii-engineer/PII-Engineer-Multi-NER-v2.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER
How to use pii-engineer/PII-Engineer-Multi-NER-v2.1 with GLiNER:
from gliner import GLiNER model = GLiNER.from_pretrained("pii-engineer/PII-Engineer-Multi-NER-v2.1") - Notebooks
- Google Colab
- Kaggle
File size: 4,120 Bytes
ba4b7fc 8ce12a2 2128b49 8ce12a2 2128b49 8ce12a2 2128b49 8ce12a2 ba4b7fc 2128b49 ba4b7fc 2128b49 ba4b7fc 2128b49 ba4b7fc 2128b49 ba4b7fc 2128b49 ba4b7fc 2128b49 ba4b7fc 2128b49 ba4b7fc 2128b49 ba4b7fc 2128b49 ba4b7fc 2128b49 ba4b7fc 2128b49 ba4b7fc 2128b49 ba4b7fc 2128b49 ba4b7fc 2128b49 8ce12a2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | ---
language:
- en
- zh
- ms
- id
- vi
- ta
- th
- hi
- bn
- ko
- ja
- de
- fr
- ru
license: apache-2.0
tags:
- pii
- ner
- gliner
- privacy
- gdpr
- pdpa
- multilingual
- onnx
datasets:
- custom
metrics:
- f1
pipeline_tag: token-classification
---
<p align="center">
<a href="https://pii.engineer">
<img src="https://pii.engineer/static/banner.webp" alt="PII Engineer" width="100%" />
</a>
</p>
# PII Engineer — Multilingual NER v2.1
Fast, multilingual PII detection model. Detects 30+ PII types across 50+ languages from a single model, no GPU required.
**[Live Demo](https://pii.engineer)** · **[Benchmarks](https://pii.engineer/benchmarks)** · **[GitHub](https://github.com/gantz-ai/pii.engineer)** · **[Blog](https://pii.engineer/blog)**
## Benchmarks
| | PII Engineer | Presidio | spaCy | AWS Comprehend |
|---|---|---|---|---|
| **F1 (multilingual)** | **0.86** | 0.44 | 0.64 | 0.52 |
| **F1 (English)** | **0.88** | 0.80 | 0.83 | 0.82 |
| **Languages** | **50+** | ~10 locales | 1 per model | 12 |
| **Latency (p50)** | 180ms | 80ms (w/ NER) | 120ms | 200ms |
| **GPU required** | No | No | Optional | N/A |
| **Cost (1M req/mo)** | **$42** | $42 | $42 | ~$1,000 |
[Full benchmarks →](https://pii.engineer/benchmarks)
### Accuracy by Language
| Language | F1 |
|----------|-----|
| English | 0.931 |
| Chinese | 0.918 |
| Vietnamese | 0.912 |
| Korean | 0.905 |
| Indonesian | 0.901 |
| Malay | 0.895 |
| Hindi | 0.892 |
| Thai | 0.885 |
| Tamil | 0.878 |
### Per-Entity Accuracy
| Entity Type | F1 |
|-------------|-----|
| email_address | 0.970 |
| phone_number | 0.968 |
| government_id | 0.920 |
| bank_account_number | 0.915 |
| street_address | 0.891 |
| date_of_birth | 0.887 |
| passport_number | 0.880 |
| license_plate | 0.833 |
| person_name | 0.823 |
## PII Types Detected
`person_name` · `phone_number` · `government_id` · `street_address` · `date_of_birth` · `email_address` · `passport_number` · `license_plate` · `bank_account_number`
## Model Architecture
- **Base:** [GLiNER2](https://huggingface.co/fastino/gliner2-multi-v1) (span-based NER)
- **Encoder:** mDeBERTa-v3-base (280M params), fine-tuned with LoRA on PII data
- **Inference:** 5 ONNX models (encoder, span_rep, count_embed, count_pred, classifier)
- **Quantization:** INT8 encoder available (~15-20% faster on x86 CPU)
- **Total size:** ~620MB (all languages)
## Quick Start
### With PII Engineer Server (Rust)
```bash
git clone https://github.com/gantz-ai/pii.engineer
cd pii.engineer
cargo build --release --package pii-engineer-server
cargo run --release --package pii-engineer-server
# Models auto-download on first run
# API at http://localhost:8000
```
```bash
curl -X POST http://localhost:8000/api/detect \
-H "Content-Type: application/json" \
-d '{"text": "John Doe, NRIC S9012345B, born 12 March 1985"}'
```
### With Python
```python
import requests
resp = requests.post("http://localhost:8000/api/detect", json={
"text": "John Doe lives at 42 Orchard Road, Singapore 238879",
"labels": ["person_name", "street_address", "phone_number", "email_address"]
})
for entity in resp.json()["entities"]:
print(f'{entity["type"]}: {entity["value"]} (score: {entity["score"]:.2f})')
```
### Download Models Manually
```bash
pip install huggingface_hub
huggingface-cli download pii-engineer/PII-Engineer-Multi-NER-v2.1 --local-dir models/PII-Engineer-Multi-NER-v2.1
huggingface-cli download pii-engineer/PII-Engineer-Chinese-NER-v1.0 --local-dir models/PII-Engineer-Chinese-NER-v1.0
```
## Use Cases
- **PDPA/GDPR/CCPA compliance** — detect PII in databases, logs, documents
- **Data anonymization** — redact PII before sharing datasets
- **CI/CD scanning** — catch leaked PII in code and configs
- **Chat/support data** — clean PII from customer interactions
## License
AGPL-3.0 — free for open-source use. Commercial license available at [pii.engineer](https://pii.engineer).
## Citation
```bibtex
@software{pii_engineer,
title = {PII Engineer: Multilingual PII Detection},
url = {https://github.com/gantz-ai/pii.engineer},
year = {2026}
}
``` |