PII Detection Model (DeBERTa-v3-xsmall, Fine-tuned)

This model is a fine-tuned version of microsoft/deberta-v3-xsmall for Named Entity Recognition (NER) focused on detecting Personally Identifiable Information (PII).

Model Details

  • Base Model: microsoft/deberta-v3-xsmall
  • Dataset: ai4privacy/pii-masking-200k (English subset, ~43k samples)
  • Training: 5 epochs, batch size 32 (effective), learning rate 3e-5
  • Best F1: 0.59

Intended Use

This model is designed for:

  • Detecting PII entities in text (Names, Emails, Phone Numbers, Addresses, SSNs, etc.)
  • Privacy compliance and data anonymization pipelines
  • Research and development in privacy-preserving NLP

Limitations

  • Trained primarily on English text.
  • Performance may vary on domain-specific or out-of-distribution data.
  • Not a replacement for comprehensive privacy review.

How to Use

from transformers import pipeline

pii_detector = pipeline("ner", model="mukuls9971/pii-deberta-v3-xsmall", aggregation_strategy="simple")
text = "Contact me at jane.smith@example.org or call (555) 123-4567."
results = pii_detector(text)
print(results)
Downloads last month
5
Safetensors
Model size
70.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mukuls9971/pii-deberta-v3-xsmall

Finetuned
(47)
this model

Dataset used to train mukuls9971/pii-deberta-v3-xsmall

Space using mukuls9971/pii-deberta-v3-xsmall 1