Why always qwen miss arabic even though 430 million people speak Arabic
#8
by
yousef1727
- opened
Even though 430M speak Arabic, Qwen struggles because training data is mostly English, Arabic is morphologically complex, has many dialects, and tokenization often breaks words.