LeverageX
/

scibert-wechsel-korean

Model card Files Files and versions

snoop2head commited on Jan 8, 2022

Commit

7b949b9

·

1 Parent(s): 429dbb9

Create README.md

Files changed (1) hide show

README.md +14 -0

README.md ADDED Viewed

	@@ -0,0 +1,14 @@

+# scibert-wechsel-korean
+Scibert(🇺🇸) converted into Korean(🇰🇷) using WECHSEL technique.
+### Description
+- SciBERT is trained on papers from the corpus of semanticscholar.org. Corpus size is 1.14M papers, 3.1B tokens.
+- Wechsel is converting embedding layer's subword tokens from source language to target language.
+- SciBERT trained with English language is converted into Korean langauge using Wechsel technique.
+- Korean tokenizer is selected with KLUE PLMs' tokenizers due to its similar vocab size(32000) and performance.
+### Reference
+- [Scibert](https://github.com/allenai/scibert)
+- [WECHSEL](https://github.com/CPJKU/wechsel)
+- [Korean Language Understanding Evaluation](https://github.com/KLUE-benchmark/KLUE)