| | --- |
| | license: apache-2.0 |
| | language: en |
| | tags: |
| | - ner |
| | - pii |
| | - privacy |
| | - token-classification |
| | - deberta |
| | - onnx |
| | library_name: onnxruntime |
| | pipeline_tag: token-classification |
| | --- |
| | |
| | # Shade V5 β On-Device PII Detection |
| |
|
| | Fast, accurate PII (Personally Identifiable Information) detection model for privacy-preserving AI pipelines. Detects 12 entity types with 97.6% F1 score. |
| |
|
| | ## Quick Start |
| |
|
| | ```python |
| | pip install veil-phantom |
| | ``` |
| |
|
| | ```python |
| | from veil_phantom import VeilClient |
| | |
| | veil = VeilClient() # auto-downloads this model |
| | result = veil.redact("John Smith sent $5M to john@acme.com") |
| | result.sanitized # "[PERSON_1] sent [AMOUNT_1] to [EMAIL_1]" |
| | ``` |
| |
|
| | ## Model Details |
| |
|
| | | Property | Value | |
| | |----------|-------| |
| | | Architecture | DeBERTa-v3-xsmall | |
| | | Parameters | 22M | |
| | | Format | ONNX | |
| | | Size | 270 MB | |
| | | Inference | <50ms on CPU | |
| | | F1 Score | 97.6% (in-distribution) | |
| | | F1 Score | 97.3% (out-of-distribution) | |
| | | Task | BIO Token Classification | |
| | | Labels | 25 (12 entity types Γ B/I + O) | |
| |
|
| | ## Entity Types |
| |
|
| | | Type | F1 | Examples | |
| | |------|-----|----------| |
| | | PERSON | 96.3% | Names (Western, African, Asian, South African) | |
| | | ORG | 97.6% | Companies, institutions | |
| | | EMAIL | 100% | Email addresses | |
| | | PHONE | 98.4% | Phone numbers (international formats) | |
| | | MONEY | 99.6% | Monetary amounts | |
| | | DATE | 97.8% | Dates, times, schedules | |
| | | ADDRESS | 99.4% | Street addresses | |
| | | GOVID | 97.7% | SSN, SA ID, passport | |
| | | BANKACCT | 92.9% | Bank account numbers, IBAN | |
| | | CARD | 100% | Credit/debit card numbers | |
| | | IPADDR | 100% | IP addresses | |
| | | CASE | 97.8% | Legal case numbers | |
| |
|
| | ## Training |
| |
|
| | - **Base model**: microsoft/deberta-v3-xsmall |
| | - **Training data**: 116K examples from business meetings, legal proceedings, financial transactions |
| | - **Tokenizer**: Unigram (128K vocab) |
| | - **OOD gap**: 0.3% (97.6% β 97.3%) |
| |
|
| | ## Files |
| |
|
| | - `ShadeV5.onnx` β ONNX model (270 MB) |
| | - `tokenizer.json` β HuggingFace fast tokenizer |
| | - `tokenizer_config.json` β Tokenizer configuration |
| | - `shade_label_map.json` β BIO label β entity type mapping |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|
| | ## Part of VeilPhantom |
| |
|
| | This model powers [VeilPhantom](https://github.com/veil-privacy/veil-phantom), an open-source PII redaction SDK for agentic AI pipelines. |
| |
|