language:-hilicense:apache-2.0base_model:distilbert/distilroberta-basetags:-token-classification-ner-pii-pii-detection-de-identification-privacy-healthcare-medical-clinical-phi-hindi-pytorch-transformers-openmedpipeline_tag:token-classificationlibrary_name:transformersmetrics:-f1-precision-recallmodel-index:-name:OpenMed-PII-Hindi-FastClinical-Base-82M-v1results:-task:type:token-classificationname:NamedEntityRecognitiondataset:name:AI4Privacy(Hindisubset)type:ai4privacy/pii-masking-400ksplit:testmetrics:-type:f1value:0.9369name:F1(micro)-type:precisionvalue:0.9333name:Precision-type:recallvalue:0.9406name:Recallwidget:-text:>- डॉ. राजेश शर्मा (आधार: 1234 5678 9012) से rajesh.sharma@hospital.in या +91 98765 43210 पर संपर्क किया जा सकता है। पता: 42 महात्मा गांधी रोड, 110001 नई दिल्ली।example_title:ClinicalNotewithPII(Hindi)
OpenMed-PII-Hindi-FastClinical-Base-82M-v1
Hindi PII Detection Model | 82M Parameters | Open Source
Model Description
OpenMed-PII-Hindi-FastClinical-Base-82M-v1 is a transformer-based token classification model fine-tuned for Personally Identifiable Information (PII) detection in Hindi text. This model identifies and classifies 54 types of sensitive information including names, addresses, social security numbers, medical record numbers, and more.
Key Features
Hindi-Optimized: Specifically trained on Hindi text for optimal performance
High Accuracy: Achieves strong F1 scores across diverse PII categories
Comprehensive Coverage: Detects 55+ entity types spanning personal, financial, medical, and contact information
Privacy-Focused: Designed for de-identification and compliance with GDPR and other privacy regulations
Production-Ready: Optimized for real-world text processing pipelines
Performance
Evaluated on the Hindi subset of AI4Privacy dataset: