README / README.md
mutisya's picture
Update README.md
465d0d3 verified
metadata
title: README
emoji: πŸ“Š
colorFrom: indigo
colorTo: purple
sdk: static
pinned: false

Thiomi is a research and engineering effort focused on closing the gap between major African languages and the NLP infrastructure that exists for English. We build:

  • Datasets β€” community-collected text and speech corpora for languages that aren't well represented in public scraped data
  • Models β€” morphological analyzers, ASR systems, and translation models trained on those corpora, with the architectures and recipes that work best for each language family
  • Methods β€” open recipes for cross-lingual transfer, zero-shot morphological discovery, and other techniques that let small target datasets do useful work