tgool commited on
Commit
f5f558a
ยท
verified ยท
1 Parent(s): ab7e8c9

Upload 4 files

Browse files
Files changed (4) hide show
  1. README.md +62 -0
  2. config.json +46 -0
  3. model.safetensors +3 -0
  4. training_args.bin +3 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # KoBERT-NER-Diet
2
+
3
+ KoBERT๋ฅผ ์ด์šฉํ•œ Diet Domain ํ•œ๊ตญ์–ด Named Entity Recognition(NER) ์ž‘์—…์„ ์œ„ํ•œ ๊ฐ€์ด๋“œ์ž…๋‹ˆ๋‹ค. ๐Ÿค— `Huggingface Transformers` ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•˜์—ฌ KoBERT๋ฅผ ์†์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
4
+
5
+ ## How to use KoBERT on Huggingface Transformers Library
6
+
7
+ - ๊ธฐ์กด์˜ KoBERT๋ฅผ transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ ๊ณง๋ฐ”๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ตœ์ ํ™”ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
8
+ - transformers v2.2.2๋ถ€ํ„ฐ๋Š” ๊ฐœ์ธ์ด ๋งŒ๋“  ๋ชจ๋ธ์„ transformers๋ฅผ ํ†ตํ•ด ์ง์ ‘ ์—…๋กœ๋“œํ•˜๊ณ  ๋‹ค์šด๋กœ๋“œํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
9
+ - Tokenizer๋ฅผ ์‚ฌ์šฉํ•˜๋ ค๋ฉด `utils.py`์—์„œ `KoBERTTokenizer`๋ฅผ ์ž„ํฌํŠธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
10
+
11
+ ```python
12
+ from transformers import BertModel
13
+ from kobert_tokenizer import KoBERTTokenizer
14
+
15
+ def load_tokenizer(args):
16
+ bert_tokenizer = KoBERTTokenizer.from_pretrained(pretrained_model_name_or_path="skt/kobert-base-v1")
17
+ return bert_tokenizer
18
+ ```
19
+
20
+ ## Usage
21
+
22
+ ```bash
23
+ $ python3 main.py --model_type kobert --do_train --do_eval
24
+ ```
25
+
26
+ - `--write_pred` ์˜ต์…˜์„ ์ฃผ๋ฉด **evaluation์˜ prediction ๊ฒฐ๊ณผ**๊ฐ€ `preds` ํด๋”์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค.
27
+
28
+ ## Prediction
29
+
30
+ ```bash
31
+ $ python3 predict.py --input_file {INPUT_FILE_PATH} --output_file {OUTPUT_FILE_PATH} --model_dir {SAVED_CKPT_PATH}
32
+ ```
33
+
34
+ ## Results
35
+
36
+ | ๋ชจ๋ธ | Slot F1 (%) |
37
+ |---------------------------|-------------|
38
+ | KoBERT | 99.00 |
39
+ | DistilKoBERT | 90.00 |
40
+ | Bert-Multilingual | 99.00 |
41
+
42
+ ## ๋ฐ์ดํ„ฐ ์„ค๋ช…
43
+ - **FOOD-B**: ์Œ์‹ ์‹œ์ž‘ ํƒœ๊ทธ
44
+ - **FOOD-I**: ์Œ์‹ ์•ˆ์— ์žˆ๋Š” ํƒœ๊ทธ
45
+ - **QTY-B**: ์ˆ˜๋Ÿ‰ ์‹œ์ž‘ ํƒœ๊ทธ
46
+ - **QTY-I**: ์ˆ˜๋Ÿ‰ ์•ˆ์— ์žˆ๋Š” ํƒœ๊ทธ
47
+ - **UNIT-B**: ๋‹จ์œ„ ์‹œ์ž‘ ํƒœ๊ทธ
48
+
49
+ ### NER ์ž…๋ ฅ ์˜ˆ์‹œ
50
+ ```
51
+ ๋‚˜๋Š” ํ•œ์ž”์€ ์•„์ด์Šค ์•„๋ฉ”๋ฆฌ์นด๋…ธ๋ฅผ ๋งˆ์‹œ๊ณ  ๋””์ €ํŠธ๋Š” ๋งˆ์นด๋กฑ 3๊ฐœ๋ฅผ ๋จน์Œ.
52
+ ```
53
+
54
+ ### NER ์ถœ๋ ฅ ์˜ˆ์‹œ
55
+ ```
56
+ ๋‚˜๋Š” [ํ•œ:QTY-B] [์ž”:UNIT-B] ์€ [์•„์ด์Šค:FOOD-B] [์•„๋ฉ”๋ฆฌ์นด๋…ธ:FOOD-I] ๋งˆ์‹œ๊ณ  ๋””์ €ํŠธ๋Š” [๋งˆ์นด๋กฑ:FOOD-B] [3:QTY-B] [๊ฐœ:UNIT-B] ๋ฅผ ๋จน์Œ.
57
+ ```
58
+
59
+ ## References
60
+
61
+ - [Naver NLP Challenge](https://github.com/naver/nlp-challenge)
62
+ - [Huggingface Transformers](https://github.com/huggingface/transformers)
config.json ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "tgool/kobert",
3
+ "architectures": [
4
+ "BertForTokenClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "finetuning_task": "fancy-ner",
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "UNK",
14
+ "1": "O",
15
+ "2": "FOOD-B",
16
+ "3": "FOOD-I",
17
+ "4": "QTY-B",
18
+ "5": "QTY-I",
19
+ "6": "UNIT-B",
20
+ "7": "UNIT-I"
21
+ },
22
+ "initializer_range": 0.02,
23
+ "intermediate_size": 3072,
24
+ "label2id": {
25
+ "FOOD-B": 2,
26
+ "FOOD-I": 3,
27
+ "O": 1,
28
+ "QTY-B": 4,
29
+ "QTY-I": 5,
30
+ "UNIT-B": 6,
31
+ "UNIT-I": 7,
32
+ "UNK": 0
33
+ },
34
+ "layer_norm_eps": 1e-12,
35
+ "max_position_embeddings": 512,
36
+ "model_type": "bert",
37
+ "num_attention_heads": 12,
38
+ "num_hidden_layers": 12,
39
+ "pad_token_id": 1,
40
+ "position_embedding_type": "absolute",
41
+ "torch_dtype": "float32",
42
+ "transformers_version": "4.42.4",
43
+ "type_vocab_size": 2,
44
+ "use_cache": true,
45
+ "vocab_size": 8002
46
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:466c7801d82d0817c22014e2100226bc4985096dd920289b9947796c53054e72
3
+ size 366433024
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:816ac7ddd860b4d33e575d52d896d66fac98c6a5706c04a26590b0756b82023b
3
+ size 1592