| # KoBERT-NER-Diet | |
| KoBERT๋ฅผ ์ด์ฉํ Diet Domain ํ๊ตญ์ด Named Entity Recognition(NER) ์์ ์ ์ํ ๊ฐ์ด๋์ ๋๋ค. ๐ค `Huggingface Transformers` ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ํ์ฉํ์ฌ KoBERT๋ฅผ ์์ฝ๊ฒ ์ฌ์ฉํ ์ ์์ต๋๋ค. | |
| ## How to use KoBERT on Huggingface Transformers Library | |
| - ๊ธฐ์กด์ KoBERT๋ฅผ transformers ๋ผ์ด๋ธ๋ฌ๋ฆฌ์์ ๊ณง๋ฐ๋ก ์ฌ์ฉํ ์ ์๋๋ก ์ต์ ํํ์์ต๋๋ค. | |
| - transformers v2.2.2๋ถํฐ๋ ๊ฐ์ธ์ด ๋ง๋ ๋ชจ๋ธ์ transformers๋ฅผ ํตํด ์ง์ ์ ๋ก๋ํ๊ณ ๋ค์ด๋ก๋ํ์ฌ ์ฌ์ฉํ ์ ์์ต๋๋ค. | |
| - Tokenizer๋ฅผ ์ฌ์ฉํ๋ ค๋ฉด `utils.py`์์ `KoBERTTokenizer`๋ฅผ ์ํฌํธํด์ผ ํฉ๋๋ค. | |
| ```python | |
| from transformers import BertModel | |
| from kobert_tokenizer import KoBERTTokenizer | |
| def load_tokenizer(args): | |
| bert_tokenizer = KoBERTTokenizer.from_pretrained(pretrained_model_name_or_path="skt/kobert-base-v1") | |
| return bert_tokenizer | |
| ``` | |
| ## Usage | |
| ```bash | |
| $ python3 main.py --model_type kobert --do_train --do_eval | |
| ``` | |
| - `--write_pred` ์ต์ ์ ์ฃผ๋ฉด **evaluation์ prediction ๊ฒฐ๊ณผ**๊ฐ `preds` ํด๋์ ์ ์ฅ๋ฉ๋๋ค. | |
| ## Prediction | |
| ```bash | |
| $ python3 predict.py --input_file {INPUT_FILE_PATH} --output_file {OUTPUT_FILE_PATH} --model_dir {SAVED_CKPT_PATH} | |
| ``` | |
| ## Results | |
| | ๋ชจ๋ธ | Slot F1 (%) | | |
| |---------------------------|-------------| | |
| | KoBERT | 99.00 | | |
| | DistilKoBERT | 90.00 | | |
| | Bert-Multilingual | 99.00 | | |
| ## ๋ฐ์ดํฐ ์ค๋ช | |
| - **FOOD-B**: ์์ ์์ ํ๊ทธ | |
| - **FOOD-I**: ์์ ์์ ์๋ ํ๊ทธ | |
| - **QTY-B**: ์๋ ์์ ํ๊ทธ | |
| - **QTY-I**: ์๋ ์์ ์๋ ํ๊ทธ | |
| - **UNIT-B**: ๋จ์ ์์ ํ๊ทธ | |
| ### NER ์ ๋ ฅ ์์ | |
| ``` | |
| ๋๋ ํ์์ ์์ด์ค ์๋ฉ๋ฆฌ์นด๋ ธ๋ฅผ ๋ง์๊ณ ๋์ ํธ๋ ๋ง์นด๋กฑ 3๊ฐ๋ฅผ ๋จน์. | |
| ``` | |
| ### NER ์ถ๋ ฅ ์์ | |
| ``` | |
| ๋๋ [ํ:QTY-B] [์:UNIT-B] ์ [์์ด์ค:FOOD-B] [์๋ฉ๋ฆฌ์นด๋ ธ:FOOD-I] ๋ง์๊ณ ๋์ ํธ๋ [๋ง์นด๋กฑ:FOOD-B] [3:QTY-B] [๊ฐ:UNIT-B] ๋ฅผ ๋จน์. | |
| ``` | |
| ## References | |
| - [Naver NLP Challenge](https://github.com/naver/nlp-challenge) | |
| - [Huggingface Transformers](https://github.com/huggingface/transformers) | |