What should the prompt format?
#1
by
KLL1111
- opened
Prompt format from HF:
f"<|im_start|>user\nA red apple<|im_end|>\n<|im_start|>assistant\n"
This gives 25 tokens, but it should be 11 tokens.
from HF tokenize [151644,872,198, 32,2518,23268, 151645,198,151644,77091,198]
from gguf tokenize [27,91,318,4906,91,29,872,198, 32, 2518, 23268, 27,91,318,6213,91,397,27,91,318,4906,91,29,77091,198]
32,2518,23268 - A red apple
<|im_start|> - He doesn't understand this
I just used this tokenizer config: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo/blob/main/tokenizer/tokenizer.json because it's meant to be an exact copy of the Z-Image TE.
Since it's used as a text encoder, prompt format is not applicable. You'd just use "A red apple" [32, 2518, 23268]