PIIBench Direct Fine-Tuned DeBERTa

This is the final selected PIIBench model: a standard DeBERTa-v3-base token classifier trained directly on the prepared multi-source PII benchmark splits. It outperformed the source-conditioned hierarchical comparison model on the complete held-out experiment test split.

Paper

This model is released with the paper:

Fine-Tuning Over Architectural Complexity: Broad-Coverage PII Detection on PIIBench with DeBERTa
arXiv: https://arxiv.org/abs/2605.25816
Hugging Face Papers: https://huggingface.co/papers/2605.25816

This repository corresponds to the direct fine-tuned DeBERTa model reported as the final selected model in the paper.

Results

The reported evaluation uses the later prepared PIIBench experiment variant with 82 retained entity types and a held-out test split of 100,002 records. It is not the earlier 48-type Hub dataset release.

Held-Out Evaluation Records F1 Precision Recall
Corrected heldout subset 5,000 0.6476 0.6300 0.6662
Complete experiment test split 100,002 0.6455 0.6277 0.6645

Full-test SHA-256: 65f8edc86399ba3f9e4ba44591d4583f9271f5d1df20e30a913305049559df77

Usage

This is a standard Transformers token-classification model:

from transformers import pipeline

pipe = pipeline(
    "token-classification",
    model="Pritesh-2711/piibench-deberta-base",
    aggregation_strategy="simple",
)
print(pipe("Contact me at jane@example.com."))

Related Resources

Downloads last month
26
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Pritesh-2711/piibench-deberta-base

Finetuned
(618)
this model

Dataset used to train Pritesh-2711/piibench-deberta-base

Paper for Pritesh-2711/piibench-deberta-base