File size: 2,245 Bytes
463bf05 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 | ---
license: apache-2.0
language: en
tags:
- ner
- pii
- privacy
- token-classification
- deberta
- onnx
library_name: onnxruntime
pipeline_tag: token-classification
---
# Shade V5 — On-Device PII Detection
Fast, accurate PII (Personally Identifiable Information) detection model for privacy-preserving AI pipelines. Detects 12 entity types with 97.6% F1 score.
## Quick Start
```python
pip install veil-phantom
```
```python
from veil_phantom import VeilClient
veil = VeilClient() # auto-downloads this model
result = veil.redact("John Smith sent $5M to john@acme.com")
result.sanitized # "[PERSON_1] sent [AMOUNT_1] to [EMAIL_1]"
```
## Model Details
| Property | Value |
|----------|-------|
| Architecture | DeBERTa-v3-xsmall |
| Parameters | 22M |
| Format | ONNX |
| Size | 270 MB |
| Inference | <50ms on CPU |
| F1 Score | 97.6% (in-distribution) |
| F1 Score | 97.3% (out-of-distribution) |
| Task | BIO Token Classification |
| Labels | 25 (12 entity types × B/I + O) |
## Entity Types
| Type | F1 | Examples |
|------|-----|----------|
| PERSON | 96.3% | Names (Western, African, Asian, South African) |
| ORG | 97.6% | Companies, institutions |
| EMAIL | 100% | Email addresses |
| PHONE | 98.4% | Phone numbers (international formats) |
| MONEY | 99.6% | Monetary amounts |
| DATE | 97.8% | Dates, times, schedules |
| ADDRESS | 99.4% | Street addresses |
| GOVID | 97.7% | SSN, SA ID, passport |
| BANKACCT | 92.9% | Bank account numbers, IBAN |
| CARD | 100% | Credit/debit card numbers |
| IPADDR | 100% | IP addresses |
| CASE | 97.8% | Legal case numbers |
## Training
- **Base model**: microsoft/deberta-v3-xsmall
- **Training data**: 116K examples from business meetings, legal proceedings, financial transactions
- **Tokenizer**: Unigram (128K vocab)
- **OOD gap**: 0.3% (97.6% → 97.3%)
## Files
- `ShadeV5.onnx` — ONNX model (270 MB)
- `tokenizer.json` — HuggingFace fast tokenizer
- `tokenizer_config.json` — Tokenizer configuration
- `shade_label_map.json` — BIO label → entity type mapping
## License
Apache 2.0
## Part of VeilPhantom
This model powers [VeilPhantom](https://github.com/veil-privacy/veil-phantom), an open-source PII redaction SDK for agentic AI pipelines.
|