African language coverage and benchmark comparisons

by O96a - opened Mar 28

Mar 28

Impressive work on the African language coverage — 5 languages (Swahili, Zulu, Xhosa, Hausa, Yoruba) in a 0.4B parameter model is a meaningful contribution to low-resource NLP.

The paper (arXiv:2408.17024) mentions the Inkuba-Mono dataset. I'm curious how the model handles code-switching scenarios, which are common in multilingual African contexts. Have you evaluated on mixed-language prompts?

Also, for deployment: have you tested quantization impact on these languages? We've found that aggressive quantization (4-bit) can disproportionately affect morphologically rich languages like Zulu, where affix handling is critical.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment