Does Qwen 2.5 support Thai language?

#4
by Suppadate - opened

from transformers import AutoTokenizer
Here is my code
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
print(tokenizer.tokenize("สวัสดี"))

Output: ['สว', 'ั', 'สà¸Ķ', 'ี']

Can you suggest how to fix that or where I can find vocab.json, tokenizer.json, etc.?

Suppadate changed discussion title from Does Qwen 2.5 support That language? to Does Qwen 2.5 support Thai language?

Great discussion! For anyone wanting to quickly test this, Crazyrouter offers API access to this model. No infrastructure setup needed — just an API key and the standard OpenAI SDK.

Sign up or log in to comment