HerBERT NER – RODO / GDPR Anonymization (PL)

Fine-tuned HerBERT model for Named Entity Recognition (NER) focused on automatic anonymization of RODO / GDPR-sensitive data in Polish-language texts, with emphasis on medical and administrative documents.

The model is designed to detect personal data, sensitive attributes, identifiers, and contact information to support downstream anonymization pipelines.

✨ Use case

Anonymization of medical documentation
GDPR / RODO compliance
Text preprocessing before analytics or ML
Research on privacy-preserving NLP for Polish

🧠 Model details

Base model: HerBERT (Polish BERT)
Task: Token classification (NER)
Labeling scheme: BIO
Training: Supervised fine-tuning
Framework: 🤗 Transformers + PyTorch

🏷️ Supported entity classes

Personal identifiers

NAME – first names
SURNAME – last names
AGE – age
DATE_OF_BIRTH – date of birth
DATE – other dates identifying events
SEX – sex / gender (explicit)
RELIGION – religion
POLITICAL_VIEW – political views
ETHNICITY – ethnicity / nationality
SEXUAL_ORIENTATION – sexual orientation
HEALTH – health-related information
RELATIVE – family relations revealing identity

Contact & location

CITY – city (general context)
ADDRESS – full address
EMAIL – email addresses
PHONE – phone numbers

Identifiers

PESEL – Polish national ID number
DOCUMENT_NUMBER – ID / passport / license numbers

Professional & education

COMPANY – employer
SCHOOL_NAME – school name
JOB_TITLE – job position

Financial

BANK_ACCOUNT – bank account numbers
CREDIT_CARD_NUMBER – payment cards

Digital identifiers

USERNAME – usernames / logins
SECRET – passwords, API keys

📊 Evaluation

Validation F1-score: ~0.80
Precision-oriented configuration (privacy-first)
Evaluation performed on held-out validation set

Note: For anonymization tasks, recall is prioritized to minimize data leakage.

Downloads last month: 13

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OskarBartoszyk/PLAnonimizer

Quantizations

1 model