karmaUI
/

shade-v5

Token Classification

Model card Files Files and versions

shade-v5 / README.md

karmaUI's picture

Upload README.md with huggingface_hub

463bf05 verified 13 days ago

|

history blame contribute delete

2.25 kB

	---
	license: apache-2.0
	language: en
	tags:
	- ner
	- pii
	- privacy
	- token-classification
	- deberta
	- onnx
	library_name: onnxruntime
	pipeline_tag: token-classification
	---

	# Shade V5 — On-Device PII Detection

	Fast, accurate PII (Personally Identifiable Information) detection model for privacy-preserving AI pipelines. Detects 12 entity types with 97.6% F1 score.

	## Quick Start

	```python
	pip install veil-phantom
	```

	```python
	from veil_phantom import VeilClient

	veil = VeilClient() # auto-downloads this model
	result = veil.redact("John Smith sent $5M to john@acme.com")
	result.sanitized # "[PERSON_1] sent [AMOUNT_1] to [EMAIL_1]"
	```

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| DeBERTa-v3-xsmall \|
	\| Parameters \| 22M \|
	\| Format \| ONNX \|
	\| Size \| 270 MB \|
	\| Inference \| <50ms on CPU \|
	\| F1 Score \| 97.6% (in-distribution) \|
	\| F1 Score \| 97.3% (out-of-distribution) \|
	\| Task \| BIO Token Classification \|
	\| Labels \| 25 (12 entity types × B/I + O) \|

	## Entity Types

	\| Type \| F1 \| Examples \|
	\|------\|-----\|----------\|
	\| PERSON \| 96.3% \| Names (Western, African, Asian, South African) \|
	\| ORG \| 97.6% \| Companies, institutions \|
	\| EMAIL \| 100% \| Email addresses \|
	\| PHONE \| 98.4% \| Phone numbers (international formats) \|
	\| MONEY \| 99.6% \| Monetary amounts \|
	\| DATE \| 97.8% \| Dates, times, schedules \|
	\| ADDRESS \| 99.4% \| Street addresses \|
	\| GOVID \| 97.7% \| SSN, SA ID, passport \|
	\| BANKACCT \| 92.9% \| Bank account numbers, IBAN \|
	\| CARD \| 100% \| Credit/debit card numbers \|
	\| IPADDR \| 100% \| IP addresses \|
	\| CASE \| 97.8% \| Legal case numbers \|

	## Training

	- Base model: microsoft/deberta-v3-xsmall
	- Training data: 116K examples from business meetings, legal proceedings, financial transactions
	- Tokenizer: Unigram (128K vocab)
	- OOD gap: 0.3% (97.6% → 97.3%)

	## Files

	- `ShadeV5.onnx` — ONNX model (270 MB)
	- `tokenizer.json` — HuggingFace fast tokenizer
	- `tokenizer_config.json` — Tokenizer configuration
	- `shade_label_map.json` — BIO label → entity type mapping

	## License

	Apache 2.0

	## Part of VeilPhantom

	This model powers [VeilPhantom](https://github.com/veil-privacy/veil-phantom), an open-source PII redaction SDK for agentic AI pipelines.