--- license: apache-2.0 language: en tags: - ner - pii - privacy - token-classification - deberta - onnx library_name: onnxruntime pipeline_tag: token-classification --- # Shade V5 — On-Device PII Detection Fast, accurate PII (Personally Identifiable Information) detection model for privacy-preserving AI pipelines. Detects 12 entity types with 97.6% F1 score. ## Quick Start ```python pip install veil-phantom ``` ```python from veil_phantom import VeilClient veil = VeilClient() # auto-downloads this model result = veil.redact("John Smith sent $5M to john@acme.com") result.sanitized # "[PERSON_1] sent [AMOUNT_1] to [EMAIL_1]" ``` ## Model Details | Property | Value | |----------|-------| | Architecture | DeBERTa-v3-xsmall | | Parameters | 22M | | Format | ONNX | | Size | 270 MB | | Inference | <50ms on CPU | | F1 Score | 97.6% (in-distribution) | | F1 Score | 97.3% (out-of-distribution) | | Task | BIO Token Classification | | Labels | 25 (12 entity types × B/I + O) | ## Entity Types | Type | F1 | Examples | |------|-----|----------| | PERSON | 96.3% | Names (Western, African, Asian, South African) | | ORG | 97.6% | Companies, institutions | | EMAIL | 100% | Email addresses | | PHONE | 98.4% | Phone numbers (international formats) | | MONEY | 99.6% | Monetary amounts | | DATE | 97.8% | Dates, times, schedules | | ADDRESS | 99.4% | Street addresses | | GOVID | 97.7% | SSN, SA ID, passport | | BANKACCT | 92.9% | Bank account numbers, IBAN | | CARD | 100% | Credit/debit card numbers | | IPADDR | 100% | IP addresses | | CASE | 97.8% | Legal case numbers | ## Training - **Base model**: microsoft/deberta-v3-xsmall - **Training data**: 116K examples from business meetings, legal proceedings, financial transactions - **Tokenizer**: Unigram (128K vocab) - **OOD gap**: 0.3% (97.6% → 97.3%) ## Files - `ShadeV5.onnx` — ONNX model (270 MB) - `tokenizer.json` — HuggingFace fast tokenizer - `tokenizer_config.json` — Tokenizer configuration - `shade_label_map.json` — BIO label → entity type mapping ## License Apache 2.0 ## Part of VeilPhantom This model powers [VeilPhantom](https://github.com/veil-privacy/veil-phantom), an open-source PII redaction SDK for agentic AI pipelines.