File size: 4,120 Bytes
ba4b7fc
 
8ce12a2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2128b49
8ce12a2
 
 
 
 
 
 
 
2128b49
8ce12a2
2128b49
8ce12a2
ba4b7fc
 
 
2128b49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba4b7fc
 
 
2128b49
ba4b7fc
 
2128b49
 
ba4b7fc
 
2128b49
 
ba4b7fc
 
 
2128b49
 
 
ba4b7fc
 
2128b49
ba4b7fc
2128b49
 
ba4b7fc
2128b49
 
 
 
ba4b7fc
2128b49
 
 
ba4b7fc
2128b49
ba4b7fc
2128b49
 
 
 
 
ba4b7fc
2128b49
ba4b7fc
2128b49
 
 
 
ba4b7fc
 
 
 
 
2128b49
 
 
 
 
 
 
 
8ce12a2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
language:
- en
- zh
- ms
- id
- vi
- ta
- th
- hi
- bn
- ko
- ja
- de
- fr
- ru
license: apache-2.0
tags:
- pii
- ner
- gliner
- privacy
- gdpr
- pdpa
- multilingual
- onnx
datasets:
- custom
metrics:
- f1
pipeline_tag: token-classification
---

<p align="center">
  <a href="https://pii.engineer">
    <img src="https://pii.engineer/static/banner.webp" alt="PII Engineer" width="100%" />
  </a>
</p>

# PII Engineer — Multilingual NER v2.1

Fast, multilingual PII detection model. Detects 30+ PII types across 50+ languages from a single model, no GPU required.

**[Live Demo](https://pii.engineer)** · **[Benchmarks](https://pii.engineer/benchmarks)** · **[GitHub](https://github.com/gantz-ai/pii.engineer)** · **[Blog](https://pii.engineer/blog)**

## Benchmarks

| | PII Engineer | Presidio | spaCy | AWS Comprehend |
|---|---|---|---|---|
| **F1 (multilingual)** | **0.86** | 0.44 | 0.64 | 0.52 |
| **F1 (English)** | **0.88** | 0.80 | 0.83 | 0.82 |
| **Languages** | **50+** | ~10 locales | 1 per model | 12 |
| **Latency (p50)** | 180ms | 80ms (w/ NER) | 120ms | 200ms |
| **GPU required** | No | No | Optional | N/A |
| **Cost (1M req/mo)** | **$42** | $42 | $42 | ~$1,000 |

[Full benchmarks →](https://pii.engineer/benchmarks)

### Accuracy by Language

| Language | F1 |
|----------|-----|
| English | 0.931 |
| Chinese | 0.918 |
| Vietnamese | 0.912 |
| Korean | 0.905 |
| Indonesian | 0.901 |
| Malay | 0.895 |
| Hindi | 0.892 |
| Thai | 0.885 |
| Tamil | 0.878 |

### Per-Entity Accuracy

| Entity Type | F1 |
|-------------|-----|
| email_address | 0.970 |
| phone_number | 0.968 |
| government_id | 0.920 |
| bank_account_number | 0.915 |
| street_address | 0.891 |
| date_of_birth | 0.887 |
| passport_number | 0.880 |
| license_plate | 0.833 |
| person_name | 0.823 |

## PII Types Detected

`person_name` · `phone_number` · `government_id` · `street_address` · `date_of_birth` · `email_address` · `passport_number` · `license_plate` · `bank_account_number`

## Model Architecture

- **Base:** [GLiNER2](https://huggingface.co/fastino/gliner2-multi-v1) (span-based NER)
- **Encoder:** mDeBERTa-v3-base (280M params), fine-tuned with LoRA on PII data
- **Inference:** 5 ONNX models (encoder, span_rep, count_embed, count_pred, classifier)
- **Quantization:** INT8 encoder available (~15-20% faster on x86 CPU)
- **Total size:** ~620MB (all languages)

## Quick Start

### With PII Engineer Server (Rust)

```bash
git clone https://github.com/gantz-ai/pii.engineer
cd pii.engineer
cargo build --release --package pii-engineer-server
cargo run --release --package pii-engineer-server
# Models auto-download on first run
# API at http://localhost:8000
```

```bash
curl -X POST http://localhost:8000/api/detect \
  -H "Content-Type: application/json" \
  -d '{"text": "John Doe, NRIC S9012345B, born 12 March 1985"}'
```

### With Python

```python
import requests

resp = requests.post("http://localhost:8000/api/detect", json={
    "text": "John Doe lives at 42 Orchard Road, Singapore 238879",
    "labels": ["person_name", "street_address", "phone_number", "email_address"]
})

for entity in resp.json()["entities"]:
    print(f'{entity["type"]}: {entity["value"]} (score: {entity["score"]:.2f})')
```

### Download Models Manually

```bash
pip install huggingface_hub
huggingface-cli download pii-engineer/PII-Engineer-Multi-NER-v2.1 --local-dir models/PII-Engineer-Multi-NER-v2.1
huggingface-cli download pii-engineer/PII-Engineer-Chinese-NER-v1.0 --local-dir models/PII-Engineer-Chinese-NER-v1.0
```

## Use Cases

- **PDPA/GDPR/CCPA compliance** — detect PII in databases, logs, documents
- **Data anonymization** — redact PII before sharing datasets
- **CI/CD scanning** — catch leaked PII in code and configs
- **Chat/support data** — clean PII from customer interactions

## License

AGPL-3.0 — free for open-source use. Commercial license available at [pii.engineer](https://pii.engineer).

## Citation

```bibtex
@software{pii_engineer,
  title = {PII Engineer: Multilingual PII Detection},
  url = {https://github.com/gantz-ai/pii.engineer},
  year = {2026}
}
```