Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- ko
|
| 5 |
+
base_model:
|
| 6 |
+
- klue/bert-base
|
| 7 |
+
pipeline_tag: feature-extraction
|
| 8 |
+
tags:
|
| 9 |
+
- medical
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# ๐ Korean Medical DPR(Dense Passage Retrieval)
|
| 13 |
+
|
| 14 |
+
## 1. Intro
|
| 15 |
+
**์๋ฃ ๋ถ์ผ**์์ ์ฌ์ฉํ ์ ์๋ Bi-Encoder ๊ตฌ์กฐ์ ๊ฒ์ ๋ชจ๋ธ์
๋๋ค.
|
| 16 |
+
ํยท์ ํผ์ฉ์ฒด์ ์๋ฃ ๊ธฐ๋ก์ ์ฒ๋ฆฌํ๊ธฐ ์ํด **SapBERT-KO-EN** ์ ๋ฒ ์ด์ค ๋ชจ๋ธ๋ก ์ด์ฉํ์ต๋๋ค.
|
| 17 |
+
์ง๋ฌธ์ Question Encoder๋ก, ํ
์คํธ๋ Context Encoder๋ฅผ ์ด์ฉํด ์ธ์ฝ๋ฉํฉ๋๋ค.
|
| 18 |
+
|
| 19 |
+
- Question Encoder : [https://huggingface.co/snumin44/medical-biencoder-ko-bert-question](https://huggingface.co/snumin44/medical-biencoder-ko-bert-question)
|
| 20 |
+
|
| 21 |
+
(โป ์ด ๋ชจ๋ธ์ AI Hub์ [์ด๊ฑฐ๋ AI ํฌ์ค์ผ์ด ์ง์ ์๋ต ๋ฐ์ดํฐ](https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=71762)๋ก ํ์ตํ ๋ชจ๋ธ์
๋๋ค.)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
## 2. Model
|
| 25 |
+
|
| 26 |
+
**(1) Self Alignment Pretraining (SAP)**
|
| 27 |
+
|
| 28 |
+
ํ๊ตญ ์๋ฃ ๊ธฐ๋ก์ **ํยท์ ํผ์ฉ์ฒด**๋ก ์ฐ์ฌ, ์์ด ์ฉ์ด๋ ์ธ์ํ ์ ์๋ ๋ชจ๋ธ์ด ํ์ํฉ๋๋ค.
|
| 29 |
+
Multi Similarity Loss๋ฅผ ์ด์ฉํด **๋์ผํ ์ฝ๋์ ์ฉ์ด** ๊ฐ์ ๋์ ์ ์ฌ๋๋ฅผ ๊ฐ๋๋ก ํ์ตํ์ต๋๋ค.
|
| 30 |
+
```
|
| 31 |
+
์) C3843080 || ๊ณ ํ์ ์งํ
|
| 32 |
+
C3843080 || Hypertension
|
| 33 |
+
C3843080 || High Blood Pressure
|
| 34 |
+
C3843080 || HTN
|
| 35 |
+
C3843080 || HBP
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
- SapBERT-KO-EN : [https://huggingface.co/snumin44/sap-bert-ko-en](https://huggingface.co/snumin44/sap-bert-ko-en)
|
| 40 |
+
- Github : [https://github.com/snumin44/SapBERT-KO-EN](https://github.com/snumin44/SapBERT-KO-EN)
|
| 41 |
+
|
| 42 |
+
**(2) Dense Passage Retrieval (DPR)**
|
| 43 |
+
|
| 44 |
+
SapBERT-KO-EN์ ๊ฒ์ ๋ชจ๋ธ๋ก ๋ง๋ค๊ธฐ ์ํด ์ถ๊ฐ์ ์ธ Fine-tuning์ ํด์ผ ํฉ๋๋ค.
|
| 45 |
+
Bi-Encoder ๊ตฌ์กฐ๋ก ์ง์์ ํ
์คํธ์ ์ ์ฌ๋๋ฅผ ๊ณ์ฐํ๋ DPR ๋ฐฉ์์ผ๋ก Fine-tuning ํ์ต๋๋ค.
|
| 46 |
+
๋ค์๊ณผ ๊ฐ์ด ๊ธฐ์กด์ ๋ฐ์ดํฐ ์
์ **ํยท์ ํผ์ฉ์ฒด ์ํ์ ์ฆ๊ฐ**ํ ๋ฐ์ดํฐ ์
์ ์ฌ์ฉํ์ต๋๋ค.
|
| 47 |
+
```
|
| 48 |
+
์) ํ๊ตญ์ด ๋ณ๋ช
: ๊ณ ํ์
|
| 49 |
+
์์ด ๋ณ๋ช
: Hypertenstion
|
| 50 |
+
์ง์ (์๋ณธ): ์๋ฒ์ง๊ฐ ๊ณ ํ์์ธ๋ฐ ๊ทธ๊ฒ ๋ญ์ง ๋ชจ๋ฅด๊ฒ ์ด. ๊ณ ํ์์ด ๋ญ์ง ์ค๋ช
์ข ํด์ค.
|
| 51 |
+
์ง์ (์ฆ๊ฐ): ์๋ฒ์ง๊ฐ Hypertenstion ์ธ๋ฐ ๊ทธ๊ฒ ๋ญ์ง ๋ชจ๋ฅด๊ฒ ์ด. Hypertenstion ์ด ๋ญ์ง ์ค๋ช
์ข ํด์ค.
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
- Github : [https://github.com/snumin44/DPR-KO](https://github.com/snumin44/DPR-KO)
|
| 55 |
+
|
| 56 |
+
|
| 57 |
+
## 3. Training
|
| 58 |
+
|
| 59 |
+
**(1) Self Alignment Pretraining (SAP)**
|
| 60 |
+
|
| 61 |
+
SapBERT-KO-EN ํ์ต์ ํ์ฉํ ๋ฒ ์ด์ค ๋ชจ๋ธ ๋ฐ ํ์ดํผ ํ๋ผ๋ฏธํฐ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
|
| 62 |
+
ํยท์ ์๋ฃ ์ฉ์ด๋ฅผ ์๋กํ ์๋ฃ ์ฉ์ด ์ฌ์ ์ธ **KOSTOM**์ ํ์ต ๋ฐ์ดํฐ๋ก ์ฌ์ฉํ์ต๋๋ค.
|
| 63 |
+
|
| 64 |
+
- Model : klue/bert-base
|
| 65 |
+
- Dataset : **KOSTOM**
|
| 66 |
+
- Epochs : 1
|
| 67 |
+
- Batch Size : 64
|
| 68 |
+
- Max Length : 64
|
| 69 |
+
- Dropout : 0.1
|
| 70 |
+
- Pooler : 'cls'
|
| 71 |
+
- Eval Step : 100
|
| 72 |
+
- Threshold : 0.8
|
| 73 |
+
- Scale Positive Sample : 1
|
| 74 |
+
- Scale Negative Sample : 60
|
| 75 |
+
|
| 76 |
+
**(2) Dense Passage Retrieval (DPR)**
|
| 77 |
+
|
| 78 |
+
Fine-tuning์ ํ์ฉํ ๋ฒ ์ด์ค ๋ชจ๋ธ ๋ฐ ํ์ดํผ ํ๋ผ๋ฏธํฐ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
|
| 79 |
+
|
| 80 |
+
- Model : SapBERT-KO-EN(klue/bert-base)
|
| 81 |
+
- Dataset : **์ด๊ฑฐ๋ AI ํฌ์ค์ผ์ด ์ง์ ์๋ต ๋ฐ์ดํฐ(AI Hub)**
|
| 82 |
+
- Epochs : 10
|
| 83 |
+
- Batch Size : 64
|
| 84 |
+
- Dropout : 0.1
|
| 85 |
+
- Pooler : 'cls'
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
## 4. Example
|
| 89 |
+
์ด ๋ชจ๋ธ์ ์ง๋ฌธ์ ์ธ์ฝ๋ฉํ๋ ๋ชจ๋ธ๋ก, Context ๋ชจ๋ธ๊ณผ ํจ๊ป ์ฌ์ฉํด์ผ ํฉ๋๋ค.
|
| 90 |
+
๋์ผํ ์ง๋ณ์ ๊ดํ ์ง๋ฌธ๊ณผ ํ
์คํธ๊ฐ ๋์ ์ ์ฌ๋๋ฅผ ๋ณด์ธ๋ค๋ ์ฌ์ค์ ํ์ธํ ์ ์์ต๋๋ค.
|
| 91 |
+
|
| 92 |
+
```python
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
## Citing
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
```
|