philschmid
/

gemma-tokenizer-chatml

Model card Files Files and versions

philschmid commited on Feb 24, 2024

Commit

d3d9ba2

·

verified ·

1 Parent(s): 631c7b7

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags: ["gemma","chatml"]
 This repository includes a fast tokenizer for [google/gemma-7b](https://huggingface.co/google/gemma-7b) with the ChatML format. The Tokenizer was created by replacing the string values of original tokens with id `106` (`<start_of_turn>`) and `107` (`<end_of_turn>`) with the chatML tokens `<|im_start|>` and `<|im_end|>`.
-No need tokens where added during that process to make sure that the embedding of the original model doesn't need to be modified.
 ```python
 from transformers import AutoTokenizer

 This repository includes a fast tokenizer for [google/gemma-7b](https://huggingface.co/google/gemma-7b) with the ChatML format. The Tokenizer was created by replacing the string values of original tokens with id `106` (`<start_of_turn>`) and `107` (`<end_of_turn>`) with the chatML tokens `<|im_start|>` and `<|im_end|>`.
+No new tokens were added during that process to ensure that the original model's embedding doesn't need to be modified.
 ```python
 from transformers import AutoTokenizer