GGUF shows eos_token_id == separator_token_id == 1
While inspecting the GGUF metadata Iβm seeing eos_token_id = 1 and separator_token_id = 1 (both pointing at <|separator|>).
Is this an intentional choice or a mismatch introduced during conversion to GGUF?
In the tokenizer_config.json I see:
{
"added_tokens_decoder": {
...
"1": {
"content": "<|separator|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "<|eos|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
...
},
...
"eos_token": "<|eos|>",
...
"sep_token": "<|separator|>",
...
}
So it looks like <|eos|> should map to ID 2, and <|separator|> to ID 1.
Thanks a lot for any pointers! π
Hey! We re-checked - the EOS is in fact <|separator|>! <|eos|> might be the EOS for the base model, but the instruct is <|separator|>
If you do the following and override the EOS with id 2:
./llama.cpp/llama-cli \
--model unsloth/grok-2-GGUF/UD-Q3_K_XL/grok-2-UD-Q3_K_XL-00001-of-00003.gguf \
--jinja \
--threads -1 \
--n-gpu-layers 99 \
--temp 1.0 \
--top_p 0.95 \
--min_p 0.01 \
--ctx-size 16384 \
--seed 3407 \
-ot ".ffn_.*_exps.=CPU" \
-no-cnv \
--prompt "Human: Hi<|separator|>" \
--override-kv tokenizer.ggml.eos_token_id=int:2 \
--special
you will get:
Human: Hi<|separator|>
Assistant: Hello! How can I assist you today?<|separator|><|eos|> [end of text]
which shows in fact <|separator|> is the EOS - <|eos|> does appear, but it'll print <|separator|> first - hope this helps!
That makes a lot more sense now, and the example you provided really helped to clear up the confusion.
Thank you very much for the clarification! π
No worries!