Commit ·
7b949b9
1
Parent(s): 429dbb9
Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# scibert-wechsel-korean
|
| 2 |
+
|
| 3 |
+
Scibert(🇺🇸) converted into Korean(🇰🇷) using WECHSEL technique.
|
| 4 |
+
|
| 5 |
+
### Description
|
| 6 |
+
- SciBERT is trained on papers from the corpus of semanticscholar.org. Corpus size is 1.14M papers, 3.1B tokens.
|
| 7 |
+
- Wechsel is converting embedding layer's subword tokens from source language to target language.
|
| 8 |
+
- SciBERT trained with English language is converted into Korean langauge using Wechsel technique.
|
| 9 |
+
- Korean tokenizer is selected with KLUE PLMs' tokenizers due to its similar vocab size(32000) and performance.
|
| 10 |
+
|
| 11 |
+
### Reference
|
| 12 |
+
- [Scibert](https://github.com/allenai/scibert)
|
| 13 |
+
- [WECHSEL](https://github.com/CPJKU/wechsel)
|
| 14 |
+
- [Korean Language Understanding Evaluation](https://github.com/KLUE-benchmark/KLUE)
|