Update README.md
Browse files
README.md
CHANGED
|
@@ -22,7 +22,7 @@ pip install torch transformers datasets sentencepiece evaluate accelerate zstand
|
|
| 22 |
- **tokenizer.model** - Pre-generated tokenizer file.
|
| 23 |
|
| 24 |
## Tokenizer
|
| 25 |
-
The tokenizer is based on SentencePiece and has been
|
| 26 |
```python
|
| 27 |
import sentencepiece as spm
|
| 28 |
spm.SentencePieceTrainer.train(input='data.txt', model_prefix='tokenizer', vocab_size=1000)
|
|
|
|
| 22 |
- **tokenizer.model** - Pre-generated tokenizer file.
|
| 23 |
|
| 24 |
## Tokenizer
|
| 25 |
+
The tokenizer is based on SentencePiece and has been updated. The old local tokenizer.model has been removed and replaced with a new tokenizer.model uploaded to the Hugging Face Hub. If you wish to train a new tokenizer, use:
|
| 26 |
```python
|
| 27 |
import sentencepiece as spm
|
| 28 |
spm.SentencePieceTrainer.train(input='data.txt', model_prefix='tokenizer', vocab_size=1000)
|