Instructions to use pii-engineer/PII-Engineer-Multi-NER-v2.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER
How to use pii-engineer/PII-Engineer-Multi-NER-v2.1 with GLiNER:
from gliner import GLiNER model = GLiNER.from_pretrained("pii-engineer/PII-Engineer-Multi-NER-v2.1") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| - zh | |
| - ms | |
| - id | |
| - vi | |
| - ta | |
| - th | |
| - hi | |
| - bn | |
| - ko | |
| - ja | |
| - de | |
| - fr | |
| - ru | |
| license: apache-2.0 | |
| tags: | |
| - pii | |
| - ner | |
| - gliner | |
| - privacy | |
| - gdpr | |
| - pdpa | |
| - multilingual | |
| - onnx | |
| datasets: | |
| - custom | |
| metrics: | |
| - f1 | |
| pipeline_tag: token-classification | |
| <p align="center"> | |
| <a href="https://pii.engineer"> | |
| <img src="https://pii.engineer/static/banner.webp" alt="PII Engineer" width="100%" /> | |
| </a> | |
| </p> | |
| # PII Engineer — Multilingual NER v2.1 | |
| Fast, multilingual PII detection model. Detects 30+ PII types across 50+ languages from a single model, no GPU required. | |
| **[Live Demo](https://pii.engineer)** · **[Benchmarks](https://pii.engineer/benchmarks)** · **[GitHub](https://github.com/gantz-ai/pii.engineer)** · **[Blog](https://pii.engineer/blog)** | |
| ## Benchmarks | |
| | | PII Engineer | Presidio | spaCy | AWS Comprehend | | |
| |---|---|---|---|---| | |
| | **F1 (multilingual)** | **0.86** | 0.44 | 0.64 | 0.52 | | |
| | **F1 (English)** | **0.88** | 0.80 | 0.83 | 0.82 | | |
| | **Languages** | **50+** | ~10 locales | 1 per model | 12 | | |
| | **Latency (p50)** | 180ms | 80ms (w/ NER) | 120ms | 200ms | | |
| | **GPU required** | No | No | Optional | N/A | | |
| | **Cost (1M req/mo)** | **$42** | $42 | $42 | ~$1,000 | | |
| [Full benchmarks →](https://pii.engineer/benchmarks) | |
| ### Accuracy by Language | |
| | Language | F1 | | |
| |----------|-----| | |
| | English | 0.931 | | |
| | Chinese | 0.918 | | |
| | Vietnamese | 0.912 | | |
| | Korean | 0.905 | | |
| | Indonesian | 0.901 | | |
| | Malay | 0.895 | | |
| | Hindi | 0.892 | | |
| | Thai | 0.885 | | |
| | Tamil | 0.878 | | |
| ### Per-Entity Accuracy | |
| | Entity Type | F1 | | |
| |-------------|-----| | |
| | email_address | 0.970 | | |
| | phone_number | 0.968 | | |
| | government_id | 0.920 | | |
| | bank_account_number | 0.915 | | |
| | street_address | 0.891 | | |
| | date_of_birth | 0.887 | | |
| | passport_number | 0.880 | | |
| | license_plate | 0.833 | | |
| | person_name | 0.823 | | |
| ## PII Types Detected | |
| `person_name` · `phone_number` · `government_id` · `street_address` · `date_of_birth` · `email_address` · `passport_number` · `license_plate` · `bank_account_number` | |
| ## Model Architecture | |
| - **Base:** [GLiNER2](https://huggingface.co/fastino/gliner2-multi-v1) (span-based NER) | |
| - **Encoder:** mDeBERTa-v3-base (280M params), fine-tuned with LoRA on PII data | |
| - **Inference:** 5 ONNX models (encoder, span_rep, count_embed, count_pred, classifier) | |
| - **Quantization:** INT8 encoder available (~15-20% faster on x86 CPU) | |
| - **Total size:** ~620MB (all languages) | |
| ## Quick Start | |
| ### With PII Engineer Server (Rust) | |
| ```bash | |
| git clone https://github.com/gantz-ai/pii.engineer | |
| cd pii.engineer | |
| cargo build --release --package pii-engineer-server | |
| cargo run --release --package pii-engineer-server | |
| # Models auto-download on first run | |
| # API at http://localhost:8000 | |
| ``` | |
| ```bash | |
| curl -X POST http://localhost:8000/api/detect \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"text": "John Doe, NRIC S9012345B, born 12 March 1985"}' | |
| ``` | |
| ### With Python | |
| ```python | |
| import requests | |
| resp = requests.post("http://localhost:8000/api/detect", json={ | |
| "text": "John Doe lives at 42 Orchard Road, Singapore 238879", | |
| "labels": ["person_name", "street_address", "phone_number", "email_address"] | |
| }) | |
| for entity in resp.json()["entities"]: | |
| print(f'{entity["type"]}: {entity["value"]} (score: {entity["score"]:.2f})') | |
| ``` | |
| ### Download Models Manually | |
| ```bash | |
| pip install huggingface_hub | |
| huggingface-cli download pii-engineer/PII-Engineer-Multi-NER-v2.1 --local-dir models/PII-Engineer-Multi-NER-v2.1 | |
| huggingface-cli download pii-engineer/PII-Engineer-Chinese-NER-v1.0 --local-dir models/PII-Engineer-Chinese-NER-v1.0 | |
| ``` | |
| ## Use Cases | |
| - **PDPA/GDPR/CCPA compliance** — detect PII in databases, logs, documents | |
| - **Data anonymization** — redact PII before sharing datasets | |
| - **CI/CD scanning** — catch leaked PII in code and configs | |
| - **Chat/support data** — clean PII from customer interactions | |
| ## License | |
| AGPL-3.0 — free for open-source use. Commercial license available at [pii.engineer](https://pii.engineer). | |
| ## Citation | |
| ```bibtex | |
| @software{pii_engineer, | |
| title = {PII Engineer: Multilingual PII Detection}, | |
| url = {https://github.com/gantz-ai/pii.engineer}, | |
| year = {2026} | |
| } | |
| ``` |