Update README.md
Browse files
README.md
CHANGED
|
@@ -10,16 +10,35 @@ widget:
|
|
| 10 |
---
|
| 11 |
|
| 12 |
# LASSL roberta-ko-small
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
Pretrained `roberta-ko-small` on korean language was trained by [LASSL](https://github.com/lassl/lassl) framework. Below performance was evaluated at 2021/12/15.
|
| 14 |
|
| 15 |
| nsmc | klue_nli | klue_sts | korquadv1 | klue_mrc | avg |
|
| 16 |
| ---- | -------- | -------- | --------- | ---- | -------- |
|
| 17 |
| 87.8846 | 66.3086 | 83.8353 | 83.1780 | 42.4585 | 72.7330 |
|
| 18 |
|
| 19 |
-
##
|
|
|
|
| 20 |
|
| 21 |
-
```
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
```
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
# LASSL roberta-ko-small
|
| 13 |
+
## How to use
|
| 14 |
+
|
| 15 |
+
```python
|
| 16 |
+
from transformers import AutoModel, AutoTokenizer
|
| 17 |
+
model = AutoModel.from_pretrained("lassl/roberta-ko-small")
|
| 18 |
+
tokenizer = AutoTokenizer.from_pretrained("lassl/roberta-ko-small")
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
## Evaluation
|
| 22 |
Pretrained `roberta-ko-small` on korean language was trained by [LASSL](https://github.com/lassl/lassl) framework. Below performance was evaluated at 2021/12/15.
|
| 23 |
|
| 24 |
| nsmc | klue_nli | klue_sts | korquadv1 | klue_mrc | avg |
|
| 25 |
| ---- | -------- | -------- | --------- | ---- | -------- |
|
| 26 |
| 87.8846 | 66.3086 | 83.8353 | 83.1780 | 42.4585 | 72.7330 |
|
| 27 |
|
| 28 |
+
## Corpora
|
| 29 |
+
This model was trained from 6,860,062 examples (whose have 3,512,351,744 tokens). 6,860,062 examples are extracted from below corpora. If you want to get information for training, you should see `config.json`.
|
| 30 |
|
| 31 |
+
```bash
|
| 32 |
+
corpora/
|
| 33 |
+
├── [707M] kowiki_latest.txt
|
| 34 |
+
├── [ 26M] modu_dialogue_v1.2.txt
|
| 35 |
+
├── [1.3G] modu_news_v1.1.txt
|
| 36 |
+
├── [9.7G] modu_news_v2.0.txt
|
| 37 |
+
├── [ 15M] modu_np_v1.1.txt
|
| 38 |
+
├── [1008M] modu_spoken_v1.2.txt
|
| 39 |
+
├── [6.5G] modu_written_v1.0.txt
|
| 40 |
+
└── [413M] petition.txt
|
| 41 |
```
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
|