fineweb-gpt-scratch / tokenizer_config.json
shreyask's picture
GPT from scratch — 1800 steps, ppl=195.7
00da4c8 verified
{
"tokenizer_class": "PreTrainedTokenizerFast",
"bos_token": "<|endoftext|>",
"eos_token": "<|endoftext|>",
"pad_token": "<|pad|>",
"unk_token": "<|unk|>",
"model_max_length": 512
}