Do not add EOS token when tokenizine by default
#4
by p1atdev - opened
This PR reduces the confusing about tokenizer loading.
The current setting requires loading the tokenizer with add_eos_token=False or the EOS token will be added automatically, leading to weird completion results.
- Before:
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2-8x70b", add_eos_token=False)
- After:
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2-8x70b")
"add_eos_token": false in tokenizer_config.json is the same as sbintuitions/sarashina2-70b's.
https://huggingface.co/sbintuitions/sarashina2-70b/blob/main/tokenizer_config.json#L134
Thank you. LGTM!
kajyuuen changed pull request status to merged