HerBERT NER β RODO / GDPR Anonymization (PL)
Fine-tuned HerBERT model for Named Entity Recognition (NER) focused on automatic anonymization of RODO / GDPR-sensitive data in Polish-language texts, with emphasis on medical and administrative documents.
The model is designed to detect personal data, sensitive attributes, identifiers, and contact information to support downstream anonymization pipelines.
β¨ Use case
- Anonymization of medical documentation
- GDPR / RODO compliance
- Text preprocessing before analytics or ML
- Research on privacy-preserving NLP for Polish
π§ Model details
- Base model: HerBERT (Polish BERT)
- Task: Token classification (NER)
- Labeling scheme: BIO
- Training: Supervised fine-tuning
- Framework: π€ Transformers + PyTorch
π·οΈ Supported entity classes
Personal identifiers
NAMEβ first namesSURNAMEβ last namesAGEβ ageDATE_OF_BIRTHβ date of birthDATEβ other dates identifying eventsSEXβ sex / gender (explicit)RELIGIONβ religionPOLITICAL_VIEWβ political viewsETHNICITYβ ethnicity / nationalitySEXUAL_ORIENTATIONβ sexual orientationHEALTHβ health-related informationRELATIVEβ family relations revealing identity
Contact & location
CITYβ city (general context)ADDRESSβ full addressEMAILβ email addressesPHONEβ phone numbers
Identifiers
PESELβ Polish national ID numberDOCUMENT_NUMBERβ ID / passport / license numbers
Professional & education
COMPANYβ employerSCHOOL_NAMEβ school nameJOB_TITLEβ job position
Financial
BANK_ACCOUNTβ bank account numbersCREDIT_CARD_NUMBERβ payment cards
Digital identifiers
USERNAMEβ usernames / loginsSECRETβ passwords, API keys
π Evaluation
- Validation F1-score: ~0.80
- Precision-oriented configuration (privacy-first)
- Evaluation performed on held-out validation set
Note: For anonymization tasks, recall is prioritized to minimize data leakage.
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support