|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- ko |
|
|
base_model: |
|
|
- beomi/kcbert-base |
|
|
pipeline_tag: token-classification |
|
|
tags: |
|
|
- Korean |
|
|
- PII |
|
|
- KoreanPII |
|
|
- PIIMasking |
|
|
- Anonymization |
|
|
- Privacy |
|
|
--- |
|
|
|
|
|
# Korean-PII-Masking-BERT |
|
|
|
|
|
**GitHub Repository**: [alphagyuu/Korean-PII-Masking-BERT](https://github.com/alphagyuu/Korean-PII-Masking-BERT) |
|
|
|
|
|
Korean-PII-Masking-BERT is a token classification model fine-tuned on KcBERTโs **TokenClassifier** using a processed version of "Korean SNS" dataset from **AI-Hub**. |
|
|
|
|
|
## ๐ฅ๏ธ Python Implementation |
|
|
- **Tokenizer**: |
|
|
```python |
|
|
BertTokenizer.from_pretrained('beomi/kcbert-base', do_lower_case=False) |
|
|
``` |
|
|
- **Model**: |
|
|
```python |
|
|
TFBertForTokenClassification.from_pretrained('alphagyuu/Korean-PII-Masking-BertForTokenClassification') |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
- **LabelMap**: |
|
|
```python |
|
|
LabelMAP = { |
|
|
'O': 'LABEL0', |
|
|
'B-URL': 'LABEL1', |
|
|
'I-URL': 'LABEL2', |
|
|
'B-๊ณ์ ': 'LABEL3', |
|
|
'I-๊ณ์ ': 'LABEL4', |
|
|
'B-๊ธ์ต': 'LABEL5', |
|
|
'I-๊ธ์ต': 'LABEL6', |
|
|
'B-๋ฒํธ': 'LABEL7', |
|
|
'I-๋ฒํธ': 'LABEL8', |
|
|
'B-์์': 'LABEL9', |
|
|
'I-์์': 'LABEL10', |
|
|
'B-์ ์': 'LABEL11', |
|
|
'I-์ ์': 'LABEL12', |
|
|
'B-์ด๋ฆ': 'LABEL13', |
|
|
'I-์ด๋ฆ': 'LABEL14', |
|
|
'B-์ฃผ์': 'LABEL15', |
|
|
'I-์ฃผ์': 'LABEL16' |
|
|
} |
|
|
``` |
|
|
|