ddh0's picture
Create README.md
29f7ef7 verified
---
library_name: transformers
---
This is a simple `PreTrainedTokenizerFast` with 5120 tokens trained on a subset of [karpathy/fineweb-edu-100b-shuffle](https://huggingface.co/datasets/karpathy/fineweb-edu-100b-shuffle), which is itself a subset of [HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu).
The tokenizer includes 6 special tokens:
```py
class SpecialTokens:
PAD = 0
BOS = 1
EOS = 2
SYSTEM = 3
USER = 4
ASSISTANT = 5
special_tokens_map = {
"<|PAD|>": SpecialTokens.PAD,
"<|BOS|>": SpecialTokens.BOS,
"<|EOS|>": SpecialTokens.EOS,
"<|SYSTEM|>": SpecialTokens.SYSTEM,
"<|USER|>": SpecialTokens.USER,
"<|ASSISTANT|>": SpecialTokens.ASSISTANT
}
```