--- license: apache-2.0 language: - ko base_model: - beomi/kcbert-base pipeline_tag: token-classification tags: - Korean - PII - KoreanPII - PIIMasking - Anonymization - Privacy --- # Korean-PII-Masking-BERT **GitHub Repository**: [alphagyuu/Korean-PII-Masking-BERT](https://github.com/alphagyuu/Korean-PII-Masking-BERT) Korean-PII-Masking-BERT is a token classification model fine-tuned on KcBERT’s **TokenClassifier** using a processed version of "Korean SNS" dataset from **AI-Hub**. ## 🖥️ Python Implementation - **Tokenizer**: ```python BertTokenizer.from_pretrained('beomi/kcbert-base', do_lower_case=False) ``` - **Model**: ```python TFBertForTokenClassification.from_pretrained('alphagyuu/Korean-PII-Masking-BertForTokenClassification') ``` - **LabelMap**: ```python LabelMAP = { 'O': 'LABEL0', 'B-URL': 'LABEL1', 'I-URL': 'LABEL2', 'B-계정': 'LABEL3', 'I-계정': 'LABEL4', 'B-금융': 'LABEL5', 'I-금융': 'LABEL6', 'B-번호': 'LABEL7', 'I-번호': 'LABEL8', 'B-소속': 'LABEL9', 'I-소속': 'LABEL10', 'B-신원': 'LABEL11', 'I-신원': 'LABEL12', 'B-이름': 'LABEL13', 'I-이름': 'LABEL14', 'B-주소': 'LABEL15', 'I-주소': 'LABEL16' } ```