HebrewGPT-296M / tokenizer_config.json
ronnengmail's picture
Upload folder using huggingface_hub
e96b55c verified
{
"tokenizer_class": "PreTrainedTokenizerFast",
"model_max_length": 512,
"bos_token": "<s>",
"eos_token": "</s>",
"unk_token": "<unk>",
"pad_token": "<pad>",
"clean_up_tokenization_spaces": false,
"note": "This model uses a tiktoken-based tokenizer (cl100k_base remapped to 8192 tokens). The tokenizer.model file is for reference but requires custom loading."
}