{ "tokenizer_class": "LlamaTokenizer", "model_max_length": 2048, "added_tokens": { "": 0, "": 1, "": 2, "": 3, "": 7, "": 8, "": 9, "": 10, "": 11, "": 12, "": 13, "": 14, "": 4, "": 5, "": 6 }, "_arkadiko_note": "The trained model config (config.json) sets bos_token_id=0, eos_token_id=2, pad_token_id=1. The actual SPM model ships =0, =1, =2, =3. The runtime SHOULD use the tokenizer-derived IDs (this file's `added_tokens`) — config.json values are kept as-trained for reproducibility but are misaligned. See README for details." }