African language coverage and benchmark comparisons

#8
by O96a - opened

Impressive work on the African language coverage β€” 5 languages (Swahili, Zulu, Xhosa, Hausa, Yoruba) in a 0.4B parameter model is a meaningful contribution to low-resource NLP.

The paper (arXiv:2408.17024) mentions the Inkuba-Mono dataset. I'm curious how the model handles code-switching scenarios, which are common in multilingual African contexts. Have you evaluated on mixed-language prompts?

Also, for deployment: have you tested quantization impact on these languages? We've found that aggressive quantization (4-bit) can disproportionately affect morphologically rich languages like Zulu, where affix handling is critical.

Sign up or log in to comment