Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,9 @@
|
|
| 1 |
|
| 2 |
Mistral擴充詞表只包含教育部常用8000字
|
| 3 |
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
```python
|
| 6 |
from transformers import AutoTokenizer
|
|
@@ -26,6 +29,6 @@ print(tokenizer.decode(tokenizer.encode('今天天氣真好!')))
|
|
| 26 |
```
|
| 27 |
|
| 28 |
|
| 29 |
-
|
| 30 |
|
| 31 |
|
|
|
|
| 1 |
|
| 2 |
Mistral擴充詞表只包含教育部常用8000字
|
| 3 |
|
| 4 |
+
後面補了25個dummy token,補到64的倍數可以增加訓練效率
|
| 5 |
+
未來可以作為special token的預留空間
|
| 6 |
+
|
| 7 |
|
| 8 |
```python
|
| 9 |
from transformers import AutoTokenizer
|
|
|
|
| 29 |
```
|
| 30 |
|
| 31 |
|
| 32 |
+
|
| 33 |
|
| 34 |
|