RANITBAG commited on
Commit
b128384
·
verified ·
1 Parent(s): 4ddc328

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -22,7 +22,7 @@ pip install torch transformers datasets sentencepiece evaluate accelerate zstand
22
  - **tokenizer.model** - Pre-generated tokenizer file.
23
 
24
  ## Tokenizer
25
- The tokenizer is based on SentencePiece and has been pre-generated. If you wish to train a new tokenizer, use:
26
  ```python
27
  import sentencepiece as spm
28
  spm.SentencePieceTrainer.train(input='data.txt', model_prefix='tokenizer', vocab_size=1000)
 
22
  - **tokenizer.model** - Pre-generated tokenizer file.
23
 
24
  ## Tokenizer
25
+ The tokenizer is based on SentencePiece and has been updated. The old local tokenizer.model has been removed and replaced with a new tokenizer.model uploaded to the Hugging Face Hub. If you wish to train a new tokenizer, use:
26
  ```python
27
  import sentencepiece as spm
28
  spm.SentencePieceTrainer.train(input='data.txt', model_prefix='tokenizer', vocab_size=1000)