--- title: README emoji: 📊 colorFrom: indigo colorTo: purple sdk: static pinned: false --- Thiomi is a research and engineering effort focused on closing the gap between major African languages and the NLP infrastructure that exists for English. We build: - **Datasets** — community-collected text and speech corpora for languages that aren't well represented in public scraped data - **Models** — morphological analyzers, ASR systems, and translation models trained on those corpora, with the architectures and recipes that work best for each language family - **Methods** — open recipes for cross-lingual transfer, zero-shot morphological discovery, and other techniques that let small target datasets do useful work