mmBERT PII Detector (LoRA Adapter)
A multilingual token classification model for Personal Identifiable Information (PII) detection using mmBERT with LoRA adapters.
Model Description
This model detects and classifies PII entities at the token level using BIO tagging:
- PERSON: Names of individuals
- EMAIL_ADDRESS: Email addresses
- PHONE_NUMBER: Phone numbers
- STREET_ADDRESS: Physical addresses
- CREDIT_CARD: Credit card numbers
- US_SSN: US Social Security Numbers
- And more...
Performance
| Metric | Score |
|---|---|
| Accuracy | 92.0% |
| F1 (weighted) | 91.0% |
| Training Time | 43 seconds (MI300X GPU) |
Training Details
- Base Model: jhu-clsp/mmBERT-base
- LoRA Rank: 32
- LoRA Alpha: 64
- Task: Token Classification (BIO tagging)
- Dataset: Microsoft Presidio Research Dataset
Usage
from peft import PeftModel
from transformers import AutoModelForTokenClassification, AutoTokenizer
# Load model
base_model = AutoModelForTokenClassification.from_pretrained(
"jhu-clsp/mmBERT-base", num_labels=35 # O + 17 entity types × 2 (B/I)
)
model = PeftModel.from_pretrained(base_model, "llm-semantic-router/mmbert-pii-detector-lora")
tokenizer = AutoTokenizer.from_pretrained("jhu-clsp/mmBERT-base")
# Detect PII
text = "Contact John Smith at john.smith@email.com or call 555-123-4567"
inputs = tokenizer(text.split(), is_split_into_words=True, return_tensors="pt")
outputs = model(**inputs)
predictions = outputs.logits.argmax(-1)
# Map predictions to labels
# ... (see full example in repository)
Supported PII Types
| Entity Type | Description |
|---|---|
| PERSON | Full names |
| EMAIL_ADDRESS | Email addresses |
| PHONE_NUMBER | Phone numbers (various formats) |
| STREET_ADDRESS | Physical addresses |
| CREDIT_CARD | Credit card numbers |
| US_SSN | US Social Security Numbers |
| IP_ADDRESS | IP addresses |
| DATE_TIME | Dates and times |
| URL | Web URLs |
| ORGANIZATION | Company/org names |
Use Cases
- Privacy Compliance: GDPR, CCPA compliance
- Data Masking: Redact PII before processing
- LLM Safety: Prevent PII leakage in LLM outputs
- Document Processing: Automated PII detection in documents
Part of vLLM Semantic Router
This model is part of the vLLM Semantic Router project.
License
Apache 2.0
- Downloads last month
- 13
Model tree for llm-semantic-router/mmbert-pii-detector-lora
Base model
jhu-clsp/mmBERT-base