mmBERT PII Detector (LoRA Adapter)

A multilingual token classification model for Personal Identifiable Information (PII) detection using mmBERT with LoRA adapters.

Model Description

This model detects and classifies PII entities at the token level using BIO tagging:

  • PERSON: Names of individuals
  • EMAIL_ADDRESS: Email addresses
  • PHONE_NUMBER: Phone numbers
  • STREET_ADDRESS: Physical addresses
  • CREDIT_CARD: Credit card numbers
  • US_SSN: US Social Security Numbers
  • And more...

Performance

Metric Score
Accuracy 92.0%
F1 (weighted) 91.0%
Training Time 43 seconds (MI300X GPU)

Training Details

  • Base Model: jhu-clsp/mmBERT-base
  • LoRA Rank: 32
  • LoRA Alpha: 64
  • Task: Token Classification (BIO tagging)
  • Dataset: Microsoft Presidio Research Dataset

Usage

from peft import PeftModel
from transformers import AutoModelForTokenClassification, AutoTokenizer

# Load model
base_model = AutoModelForTokenClassification.from_pretrained(
    "jhu-clsp/mmBERT-base", num_labels=35  # O + 17 entity types × 2 (B/I)
)
model = PeftModel.from_pretrained(base_model, "llm-semantic-router/mmbert-pii-detector-lora")
tokenizer = AutoTokenizer.from_pretrained("jhu-clsp/mmBERT-base")

# Detect PII
text = "Contact John Smith at john.smith@email.com or call 555-123-4567"
inputs = tokenizer(text.split(), is_split_into_words=True, return_tensors="pt")
outputs = model(**inputs)
predictions = outputs.logits.argmax(-1)

# Map predictions to labels
# ... (see full example in repository)

Supported PII Types

Entity Type Description
PERSON Full names
EMAIL_ADDRESS Email addresses
PHONE_NUMBER Phone numbers (various formats)
STREET_ADDRESS Physical addresses
CREDIT_CARD Credit card numbers
US_SSN US Social Security Numbers
IP_ADDRESS IP addresses
DATE_TIME Dates and times
URL Web URLs
ORGANIZATION Company/org names

Use Cases

  • Privacy Compliance: GDPR, CCPA compliance
  • Data Masking: Redact PII before processing
  • LLM Safety: Prevent PII leakage in LLM outputs
  • Document Processing: Automated PII detection in documents

Part of vLLM Semantic Router

This model is part of the vLLM Semantic Router project.

License

Apache 2.0

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for llm-semantic-router/mmbert-pii-detector-lora

Adapter
(9)
this model