gantz-ai's picture
Update README.md
8ce12a2 verified
---
language:
- en
- zh
- ms
- id
- vi
- ta
- th
- hi
- bn
- ko
- ja
- de
- fr
- ru
license: apache-2.0
tags:
- pii
- ner
- gliner
- privacy
- gdpr
- pdpa
- multilingual
- onnx
datasets:
- custom
metrics:
- f1
pipeline_tag: token-classification
---
<p align="center">
<a href="https://pii.engineer">
<img src="https://pii.engineer/static/banner.webp" alt="PII Engineer" width="100%" />
</a>
</p>
# PII Engineer — Multilingual NER v2.1
Fast, multilingual PII detection model. Detects 30+ PII types across 50+ languages from a single model, no GPU required.
**[Live Demo](https://pii.engineer)** · **[Benchmarks](https://pii.engineer/benchmarks)** · **[GitHub](https://github.com/gantz-ai/pii.engineer)** · **[Blog](https://pii.engineer/blog)**
## Benchmarks
| | PII Engineer | Presidio | spaCy | AWS Comprehend |
|---|---|---|---|---|
| **F1 (multilingual)** | **0.86** | 0.44 | 0.64 | 0.52 |
| **F1 (English)** | **0.88** | 0.80 | 0.83 | 0.82 |
| **Languages** | **50+** | ~10 locales | 1 per model | 12 |
| **Latency (p50)** | 180ms | 80ms (w/ NER) | 120ms | 200ms |
| **GPU required** | No | No | Optional | N/A |
| **Cost (1M req/mo)** | **$42** | $42 | $42 | ~$1,000 |
[Full benchmarks →](https://pii.engineer/benchmarks)
### Accuracy by Language
| Language | F1 |
|----------|-----|
| English | 0.931 |
| Chinese | 0.918 |
| Vietnamese | 0.912 |
| Korean | 0.905 |
| Indonesian | 0.901 |
| Malay | 0.895 |
| Hindi | 0.892 |
| Thai | 0.885 |
| Tamil | 0.878 |
### Per-Entity Accuracy
| Entity Type | F1 |
|-------------|-----|
| email_address | 0.970 |
| phone_number | 0.968 |
| government_id | 0.920 |
| bank_account_number | 0.915 |
| street_address | 0.891 |
| date_of_birth | 0.887 |
| passport_number | 0.880 |
| license_plate | 0.833 |
| person_name | 0.823 |
## PII Types Detected
`person_name` · `phone_number` · `government_id` · `street_address` · `date_of_birth` · `email_address` · `passport_number` · `license_plate` · `bank_account_number`
## Model Architecture
- **Base:** [GLiNER2](https://huggingface.co/fastino/gliner2-multi-v1) (span-based NER)
- **Encoder:** mDeBERTa-v3-base (280M params), fine-tuned with LoRA on PII data
- **Inference:** 5 ONNX models (encoder, span_rep, count_embed, count_pred, classifier)
- **Quantization:** INT8 encoder available (~15-20% faster on x86 CPU)
- **Total size:** ~620MB (all languages)
## Quick Start
### With PII Engineer Server (Rust)
```bash
git clone https://github.com/gantz-ai/pii.engineer
cd pii.engineer
cargo build --release --package pii-engineer-server
cargo run --release --package pii-engineer-server
# Models auto-download on first run
# API at http://localhost:8000
```
```bash
curl -X POST http://localhost:8000/api/detect \
-H "Content-Type: application/json" \
-d '{"text": "John Doe, NRIC S9012345B, born 12 March 1985"}'
```
### With Python
```python
import requests
resp = requests.post("http://localhost:8000/api/detect", json={
"text": "John Doe lives at 42 Orchard Road, Singapore 238879",
"labels": ["person_name", "street_address", "phone_number", "email_address"]
})
for entity in resp.json()["entities"]:
print(f'{entity["type"]}: {entity["value"]} (score: {entity["score"]:.2f})')
```
### Download Models Manually
```bash
pip install huggingface_hub
huggingface-cli download pii-engineer/PII-Engineer-Multi-NER-v2.1 --local-dir models/PII-Engineer-Multi-NER-v2.1
huggingface-cli download pii-engineer/PII-Engineer-Chinese-NER-v1.0 --local-dir models/PII-Engineer-Chinese-NER-v1.0
```
## Use Cases
- **PDPA/GDPR/CCPA compliance** — detect PII in databases, logs, documents
- **Data anonymization** — redact PII before sharing datasets
- **CI/CD scanning** — catch leaked PII in code and configs
- **Chat/support data** — clean PII from customer interactions
## License
AGPL-3.0 — free for open-source use. Commercial license available at [pii.engineer](https://pii.engineer).
## Citation
```bibtex
@software{pii_engineer,
title = {PII Engineer: Multilingual PII Detection},
url = {https://github.com/gantz-ai/pii.engineer},
year = {2026}
}
```