Commit
·
c2dbdac
1
Parent(s):
fea7bb4
Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ widget:
|
|
| 14 |
---
|
| 15 |
|
| 16 |
### Model description
|
| 17 |
-
This model was trained on ZH, JA, KO's Wikipedia (5 epochs).
|
| 18 |
|
| 19 |
### How to use
|
| 20 |
```python
|
|
@@ -22,7 +22,8 @@ from transformers import AutoTokenizer, AutoModelForMaskedLM
|
|
| 22 |
tokenizer = AutoTokenizer.from_pretrained("conan1024hao/cjkbert-small")
|
| 23 |
model = AutoModelForMaskedLM.from_pretrained("conan1024hao/cjkbert-small")
|
| 24 |
```
|
| 25 |
-
Before you fine-tune downstream tasks, you don't need any text segmentation.
|
|
|
|
| 26 |
|
| 27 |
### Morphological analysis tools
|
| 28 |
- ZH: For Chinese, we use [LTP](https://github.com/HIT-SCIR/ltp).
|
|
@@ -30,7 +31,7 @@ Before you fine-tune downstream tasks, you don't need any text segmentation. (Th
|
|
| 30 |
- KO: For Korean, we use [KoNLPy](https://github.com/konlpy/konlpy)(Kkma class).
|
| 31 |
|
| 32 |
### Tokenization
|
| 33 |
-
We use character-based tokenization with whole-word-masking strategy.
|
| 34 |
|
| 35 |
### Model size
|
| 36 |
- vocab_size: 15015
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
### Model description
|
| 17 |
+
- This model was trained on **ZH, JA, KO**'s Wikipedia (5 epochs).
|
| 18 |
|
| 19 |
### How to use
|
| 20 |
```python
|
|
|
|
| 22 |
tokenizer = AutoTokenizer.from_pretrained("conan1024hao/cjkbert-small")
|
| 23 |
model = AutoModelForMaskedLM.from_pretrained("conan1024hao/cjkbert-small")
|
| 24 |
```
|
| 25 |
+
- Before you fine-tune downstream tasks, you don't need any text segmentation.
|
| 26 |
+
- (Though you may obtain better results if you applied morphological analysis to the data before fine-tuning.)
|
| 27 |
|
| 28 |
### Morphological analysis tools
|
| 29 |
- ZH: For Chinese, we use [LTP](https://github.com/HIT-SCIR/ltp).
|
|
|
|
| 31 |
- KO: For Korean, we use [KoNLPy](https://github.com/konlpy/konlpy)(Kkma class).
|
| 32 |
|
| 33 |
### Tokenization
|
| 34 |
+
- We use character-based tokenization with **whole-word-masking** strategy.
|
| 35 |
|
| 36 |
### Model size
|
| 37 |
- vocab_size: 15015
|