Instructions to use ddh0/simple-tokenizer-5120 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ddh0/simple-tokenizer-5120 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ddh0/simple-tokenizer-5120", dtype="auto") - Notebooks
- Google Colab
- Kaggle
This is a simple PreTrainedTokenizerFast with 5120 tokens trained on a subset of karpathy/fineweb-edu-100b-shuffle, which is itself a subset of HuggingFaceFW/fineweb-edu.
The tokenizer includes 6 special tokens:
class SpecialTokens:
PAD = 0
BOS = 1
EOS = 2
SYSTEM = 3
USER = 4
ASSISTANT = 5
special_tokens_map = {
"<|PAD|>": SpecialTokens.PAD,
"<|BOS|>": SpecialTokens.BOS,
"<|EOS|>": SpecialTokens.EOS,
"<|SYSTEM|>": SpecialTokens.SYSTEM,
"<|USER|>": SpecialTokens.USER,
"<|ASSISTANT|>": SpecialTokens.ASSISTANT
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ddh0/simple-tokenizer-5120", dtype="auto")