Papers
arxiv:2510.07203

Sunflower: A New Approach To Expanding Coverage of African Languages in Large Language Models

Published on Oct 8, 2025
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Large language models trained on African languages demonstrate superior performance in regional linguistic contexts compared to globally-focused approaches.

AI-generated summary

There are more than 2000 living languages in Africa, most of which have been bypassed by advances in language technology. Current leading LLMs exhibit strong performance on a number of the most common languages (e.g. Swahili or Yoruba), but prioritise support for the languages with the most speakers first, resulting in piecemeal ability across disparate languages. We contend that a regionally focussed approach is more efficient, and present a case study for Uganda, a country with high linguistic diversity. We describe the development of Sunflower 14B and 32B, a pair of models based on Qwen 3 with state of the art comprehension in the majority of all Ugandan languages. These models are open source and can be used to reduce language barriers in a number of important practical applications.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2510.07203
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2510.07203 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2510.07203 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2510.07203 in a Space README.md to link it from this page.

Collections including this paper 1