Update vocab size
#42
by
mathemakitten
- opened
README.md
CHANGED
|
@@ -191,7 +191,7 @@ The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a l
|
|
| 191 |
|
| 192 |
- A simple pre-tokenization rule, no normalization
|
| 193 |
|
| 194 |
-
- A vocabulary size of 250,
|
| 195 |
|
| 196 |
It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
|
| 197 |
|
|
|
|
| 191 |
|
| 192 |
- A simple pre-tokenization rule, no normalization
|
| 193 |
|
| 194 |
+
- A vocabulary size of 250,880
|
| 195 |
|
| 196 |
It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
|
| 197 |
|