File size: 1,272 Bytes
00c580a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41c1a76
 
 
 
 
 
 
 
 
 
 
 
 
1a247de
41c1a76
00c580a
41c1a76
 
e8f1c5b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
license: apache-2.0
language:
- ko
base_model:
- beomi/kcbert-base
pipeline_tag: token-classification
tags:
- Korean
- PII
- KoreanPII
- PIIMasking
- Anonymization
- Privacy
---

# Korean-PII-Masking-BERT  

**GitHub Repository**: [alphagyuu/Korean-PII-Masking-BERT](https://github.com/alphagyuu/Korean-PII-Masking-BERT)  

Korean-PII-Masking-BERT is a token classification model fine-tuned on KcBERTโ€™s **TokenClassifier** using a processed version of "Korean SNS" dataset from **AI-Hub**. 

## ๐Ÿ–ฅ๏ธ Python Implementation  
- **Tokenizer**:  
  ```python
  BertTokenizer.from_pretrained('beomi/kcbert-base', do_lower_case=False)
  ```
- **Model**:  
  ```python
  TFBertForTokenClassification.from_pretrained('alphagyuu/Korean-PII-Masking-BertForTokenClassification')
  ```



- **LabelMap**:
  ```python
  LabelMAP = {
    'O': 'LABEL0',
    'B-URL': 'LABEL1',
    'I-URL': 'LABEL2',
    'B-๊ณ„์ •': 'LABEL3',
    'I-๊ณ„์ •': 'LABEL4',
    'B-๊ธˆ์œต': 'LABEL5',
    'I-๊ธˆ์œต': 'LABEL6',
    'B-๋ฒˆํ˜ธ': 'LABEL7',
    'I-๋ฒˆํ˜ธ': 'LABEL8',
    'B-์†Œ์†': 'LABEL9',
    'I-์†Œ์†': 'LABEL10',
    'B-์‹ ์›': 'LABEL11',
    'I-์‹ ์›': 'LABEL12',
    'B-์ด๋ฆ„': 'LABEL13',
    'I-์ด๋ฆ„': 'LABEL14',
    'B-์ฃผ์†Œ': 'LABEL15',
    'I-์ฃผ์†Œ': 'LABEL16'
}
  ```