--- title: README emoji: 🏃 colorFrom: pink colorTo: purple sdk: static pinned: false --- # Orature AI - Pioneering Urdu Language AI **Mission:** Orature AI is dedicated to advancing the frontiers of Artificial Intelligence, and Language Models for the Urdu language. We aim to develop computationally efficient, culturally-aware, and accessible language technologies that empower local communities, researchers, and businesses. Our work focuses on bridging the linguistic digital divide and promoting equitable and sustainable AI development. **Vision:** To be a leading force in creating and democratizing state-of-the-art NLP resources for Urdu, a low-resource language, fostering innovation and inclusivity in the global AI landscape. ## About Us Orature AI has emerged from the foundational work of the ALIF الف project, a Final Year Project at Habib University (Spring 2025). Our core team comprises passionate researchers and engineers committed to open-source principles and collaborative innovation. **Core Team (Founders of ALIF الف):** * Syed Muhammad Ali Naqvi * Zainab Haider * Syeda Haya Fatima * Hammad Sajid * Ali Muhammad Asad **Supervisor of ALIF الف:** * Dr. Abdul Samad * Dr Inayat Ullah **Affiliation:** * Habib University, Dhanani School of Science and Engineering ## Our Focus Areas * **Data Curation & Tokenization:** Novel creation and meticulous preprocessing of large-scale, culturally relevant datasets and language-specific tokenizer. * **Urdu Language Model Development:** Creating robust pretrained and instruction-tuned Small Language Models (SLMs) for Urdu. * **Low-Resource NLP:** Developing scalable frameworks and methodologies for building language models for underrepresented languages. * **Open Source Contribution:** Sharing models, datasets, and research findings with the global community. * **Sustainable AI:** Advocating for efficient and environmentally conscious AI practices. ## Our Flagship Project: ALIF الف The **ALIF الف** project represents our initial and core contribution, featuring a series of Urdu pretrained generative models, custom tokenizers, and comprehensive datasets. ## Get Involved * **Explore our Models & Datasets:** Browse our contributions on the Hugging Face Hub.