PII Intent Classifier - XLM-RoBERTa Large (V11)

A multilingual binary classifier that detects PII (Personally Identifiable Information) sharing intent in text messages. Built for content moderation on creator-brand collaboration platforms.

What's New in V11

V11 is the 15th iteration of this model, trained on 41,427 samples (up from 24,012 in V7c). Key improvements:

Conversation test: 97.0% accuracy (838/864) - up from 95.0% in V10
Targeted training data: ~200 new samples addressing specific failure patterns (scam+real phone, asking about numbers, humorous sharing, room/postal/tracking numbers as NOT-PII)
6 new CoT categories: room_number, postal_code, time_digits, tracking_number, username_digits, goodnight_casual
Strict policy: Any real phone/email/IBAN/handle = PII regardless of context (humor, scam warning, inquiry)

Version History

Version	Stress Test	Conversation	Training Data	Key Change
V7c	173/177 (98%)	-	24,012	First production model
V8	174/177 (98%)	-	24,012	+name_intro, greeting_slang
V9	-	-	39,303	+"real contact = always PII" policy
V10	172/177 (97%)	821/864 (95%)	39,789	+entity×label balance fix
V11	168/177 (95%)	838/864 (97%)	41,427	+targeted error fixes

Model Description

This model classifies whether a message contains an intent to share personal contact information (phone numbers, emails, social media handles, IBANs, etc.) or not. Unlike simple regex-based PII detection, this model understands context and intent:

"my number is 05321234567" → PII (sharing intent)
"05321234567 is a scammer, block them" → NOT PII (scam warning)
"I'll send you my WhatsApp tomorrow" → PII (future sharing intent)
"what is an IBAN and how do I get one?" → NOT PII (information question)
"numaram pizza siparişi gibi 05321234567 haha 😂" → PII (humor + real number)
"oda numaram 532 otelde buluşalım" → NOT PII (room number, not phone)

Key Features

Trilingual: Turkish, Arabic, English
Context-aware: Understands sarcasm, negation, hypotheticals, quoting, reporting, humor
Evasion-resistant: Detects coded sharing, profile redirects, voice note evasion, partial number sharing, spaced text evasion
High recall: 97% PII recall across 177 stress test cases
Conversation-ready: 97% accuracy on 864 real-world conversation scenarios

Training

Base model: xlm-roberta-large (550M parameters)
Dataset: gorkem371/pii-intent-detection-multilingual - 41,427 samples across 9 entity types, balanced PII/NOT-PII
Loss: Focal loss (gamma=2) with inverse class frequency weights
Training: bf16 mixed precision, lr=1.5e-5, gradient accumulation=4 (effective batch=64), 15 epochs
Hardware: NVIDIA H100 80GB HBM3
Best epoch: 14

Validation Metrics (epoch 14)

Metric	Score
F1	99.33%
Accuracy	99.24%
Precision	99.20%
Recall	99.46%

Test Results

Stress Test (177 cases)

Test Suite	Score	Description
Standard (72)	97%	Basic PII sharing and non-PII scenarios
Hardcore (72)	92%	Evasion, coded sharing, sarcasm, context tricks
Edge Cases (33)	97%	Business numbers, sarcasm+real numbers, hypotheticals
Total (177)	95%	Combined across all suites

Conversation Test (864 cases)

Metric	Score
Total Accuracy	838/864 (97.0%)
Errors	26
False Positives	2
False Negatives	24
Design Limit (entity=NONE)	19 of 24 FN
Actual Model Errors	5

Per-Language Breakdown (Stress Test)

Language	Score	Notes
Turkish	96%	Weak: masked numbers, order numbers
Arabic	96%	Weak: scam warnings, math expressions
English	100%	All standard+hardcore+edge passed

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import torch.nn.functional as F

model_name = "gorkem371/pii-intent-classifier-xlmr-large"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

def classify_pii(context: str, entity: str, entity_type: str) -> dict:
    """
    Classify whether a message contains PII sharing intent.

    Args:
        context: The full message text
        entity: The specific entity to classify (e.g., phone number, "NONE" if implicit)
        entity_type: Type of entity (PHONE, EMAIL, SOCIAL_MEDIA, IBAN, ADDRESS, URL, etc.)

    Returns:
        dict with 'is_pii' (bool) and 'confidence' (float)
    """
    text = f"{context} </s> {entity} | {entity_type}"
    inputs = tokenizer(text, max_length=256, padding="max_length", truncation=True, return_tensors="pt")

    with torch.no_grad():
        outputs = model(**inputs)
        probs = F.softmax(outputs.logits, dim=-1)
        pred = torch.argmax(probs, dim=-1).item()
        confidence = probs[0][pred].item()

    return {
        "is_pii": pred == 1,
        "label": "PII" if pred == 1 else "NOT_PII",
        "confidence": round(confidence, 4)
    }

# Examples
print(classify_pii("my number is 05321234567 call me", "05321234567", "PHONE"))
# {'is_pii': True, 'label': 'PII', 'confidence': 0.9987}

print(classify_pii("order number is ORD-784321", "ORD-784321", "PHONE"))
# {'is_pii': False, 'label': 'NOT_PII', 'confidence': 0.9954}

print(classify_pii("i will send you my whatsapp tomorrow", "NONE", "PHONE"))
# {'is_pii': True, 'label': 'PII', 'confidence': 0.9821}

print(classify_pii("oda numaram 532 otelde buluşalım", "NONE", "PHONE"))
# {'is_pii': False, 'label': 'NOT_PII', 'confidence': 0.9876}

Input Format

The model expects input in the following format:

{context} </s> {entity} | {entity_type}

context: The full message text (any language)
entity: The specific entity string, or "NONE" for implicit PII intent
entity_type: One of: PHONE, EMAIL, SOCIAL_MEDIA, IBAN, CREDIT_CARD, ADDRESS, URL, CRYPTO_ADDRESS, OFF_PLATFORM_ATTEMPT

Supported Entity Types

Type	Description	Example
PHONE	Phone numbers	05321234567, +966501234567
EMAIL	Email addresses	user@gmail.com
SOCIAL_MEDIA	Social media handles	@username, Instagram/TikTok/Telegram
IBAN	Bank account numbers	TR33000610...
ADDRESS	Physical addresses	123 Oxford Street London
URL	Websites	my-site.com
CREDIT_CARD	Credit card numbers	4532...
CRYPTO_ADDRESS	Cryptocurrency addresses	0x...
OFF_PLATFORM_ATTEMPT	Attempts to move off-platform	"let's talk on WhatsApp"

What This Model Understands

PII = True (sharing intent detected)

Direct sharing: "my number is 05321234567"
Coded/evasion: "find me on the gram @secret_handle"
Future intent: "I'll send you my number tomorrow"
Conditional: "if we agree, I'll share my contact"
Requesting: "what's your number? send it"
Profile redirect: "check my bio, my number is there"
Reluctant sharing: "I don't want to but here's my number..."
Third-party: "my friend said to contact him at..."
Humor + real number: "numaram pizza siparişi gibi 05321234567 haha 😂"
Scam warning + real number: "05321234567 dolandırıcı sakın aramayın" (number is still visible)
Business/restaurant numbers: "restoran telefonu 02125551234"
Asking about a number: "bu numara tanıdık mı 05321234567"

PII = False (no sharing intent)

Order/tracking numbers: "your order ORD-784321"
Scam warnings (no real number): "dolandırıcılara dikkat edin"
Reporting violations: "someone sent me their number, reporting"
Hypothetical (no number): "I wish I had a number to share"
Sarcasm with fake numbers: "my number is 00000000000 lol"
Statistics: "my follower count hit 532000"
Non-contact numbers: "bake at 180 degrees for 45 minutes"
Price/product codes: "SKU-TR-78431-B", "1250 TL"
Room numbers: "oda numaram 532 otelde buluşalım"
Postal codes: "posta kodu 34720 Kadıköy İstanbul"
Time digits: "saat 05:32 de buluşalım"
Tracking numbers: "kargo takip: 1Z999AA10123456784"

Limitations

Masked numbers (0532***4567): Model classifies as NOT-PII (partially masked = not fully usable)
Scam warnings with real numbers: V11 tends to flag these as PII (the number is still visible/reachable)
Math expressions containing phone-like numbers: Sometimes flagged as PII
Entity extraction dependency: 19 of 26 conversation errors are from entity extraction returning NONE, not model failures

Model Architecture

Base: XLM-RoBERTa Large (24 layers, 16 heads, 1024 hidden, 550M params)
Head: Linear(1024→1024) + Tanh + Dropout + Linear(1024→2)
Total size: ~2.1GB (safetensors)

Citation

Author: Gorkem Yildiz

If you use this model, please cite:

@misc{pii-intent-classifier-2026,
  title={PII Intent Classifier: Multilingual Context-Aware PII Detection},
  author={Gorkem Yildiz},
  year={2026},
  url={https://huggingface.co/gorkem371/pii-intent-classifier-xlmr-large},
  howpublished={\url{https://gorkemyildiz.com}}
}

Downloads last month: 59

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for gorkem371/pii-intent-classifier-xlmr-large

Base model

FacebookAI/xlm-roberta-large

Finetuned

(998)

this model

Dataset used to train gorkem371/pii-intent-classifier-xlmr-large

Space using gorkem371/pii-intent-classifier-xlmr-large 1

Evaluation results

Validation F1
self-reported

0.993
Validation Accuracy
self-reported

0.992
Stress Test Accuracy
self-reported

0.949
Conversation Test Accuracy
self-reported

0.970