African language coverage and benchmark comparisons
#8
by O96a - opened
Impressive work on the African language coverage β 5 languages (Swahili, Zulu, Xhosa, Hausa, Yoruba) in a 0.4B parameter model is a meaningful contribution to low-resource NLP.
The paper (arXiv:2408.17024) mentions the Inkuba-Mono dataset. I'm curious how the model handles code-switching scenarios, which are common in multilingual African contexts. Have you evaluated on mixed-language prompts?
Also, for deployment: have you tested quantization impact on these languages? We've found that aggressive quantization (4-bit) can disproportionately affect morphologically rich languages like Zulu, where affix handling is critical.