astu-coe-ece-0045-2025/OroLLM-sum-v1
Updated
OroLLM is an academic research initiative (Grant #0045/2025) dedicated to developing scalable, linguistically grounded Large Language Models (LLMs) for Afaan Oromo. The project focuses on: • Large-scale corpus collection and data governance • Morphology-aware tokenizer development • Efficient base model pre-training • Task-specific fine-tuning (MT, chat, summarization) • Public deployment for research and societal impact OroLLM leverages the Hugging Face ecosystem to ensure reproducibility, transparency, and open scientific collaboration. The initiative prioritizes responsible AI practices and the preservation of underrepresented African languages through open-source innovation.