File size: 1,272 Bytes
00c580a 41c1a76 1a247de 41c1a76 00c580a 41c1a76 e8f1c5b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
license: apache-2.0
language:
- ko
base_model:
- beomi/kcbert-base
pipeline_tag: token-classification
tags:
- Korean
- PII
- KoreanPII
- PIIMasking
- Anonymization
- Privacy
---
# Korean-PII-Masking-BERT
**GitHub Repository**: [alphagyuu/Korean-PII-Masking-BERT](https://github.com/alphagyuu/Korean-PII-Masking-BERT)
Korean-PII-Masking-BERT is a token classification model fine-tuned on KcBERTโs **TokenClassifier** using a processed version of "Korean SNS" dataset from **AI-Hub**.
## ๐ฅ๏ธ Python Implementation
- **Tokenizer**:
```python
BertTokenizer.from_pretrained('beomi/kcbert-base', do_lower_case=False)
```
- **Model**:
```python
TFBertForTokenClassification.from_pretrained('alphagyuu/Korean-PII-Masking-BertForTokenClassification')
```
- **LabelMap**:
```python
LabelMAP = {
'O': 'LABEL0',
'B-URL': 'LABEL1',
'I-URL': 'LABEL2',
'B-๊ณ์ ': 'LABEL3',
'I-๊ณ์ ': 'LABEL4',
'B-๊ธ์ต': 'LABEL5',
'I-๊ธ์ต': 'LABEL6',
'B-๋ฒํธ': 'LABEL7',
'I-๋ฒํธ': 'LABEL8',
'B-์์': 'LABEL9',
'I-์์': 'LABEL10',
'B-์ ์': 'LABEL11',
'I-์ ์': 'LABEL12',
'B-์ด๋ฆ': 'LABEL13',
'I-์ด๋ฆ': 'LABEL14',
'B-์ฃผ์': 'LABEL15',
'I-์ฃผ์': 'LABEL16'
}
```
|