This project is an open-source AI-powered translation system designed to make communication easier between English and Malawian local languages. It uses modern machine learning and natural language processing (NLP) models to translate text and speech accurately across languages spoken in Malawi. --- Supported Languages - The system currently supports: Chichewa (Nyanja) Chitumbuka Chiyao Chilomwe Chisena Chitonga - Additional languages and dialects can be added as data becomes available. Project Goals - Break language barriers in Malawi through accessible AI tools. - Support communication in education, health, agriculture, and government. - Preserve and promote Malawian indigenous languages in digital technology. - Provide open datasets and models for researchers and developers. Features - Text translation: English Local languages - Speech recognition: Convert spoken language to text - Text-to-speech: Speak translated text naturally - Chat integration: Support for WhatsApp and web interfaces - Offline capability: Small models for mobile and rural use.

by Ezek3121 - opened Oct 24, 2025

←

Files changed (1) hide show

README.md CHANGED Viewed

+---
+license: bigscience-openrail-m
+datasets:
+- openai/gdpval
+language:
+- af
+metrics:
+- character
+base_model:
+- sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
+new_version: deepseek-ai/DeepSeek-OCR
+pipeline_tag: voice-activity-detection
+library_name: diffusers
+tags:
+- art
+- agent
+---
+---# install
+pip install transformers datasets accelerate
+# training (very simplified)
+python run_speech_to_text_flax.py \
+  --model_name_or_path openai/whisper-small \
+  --train_data_dir ./chichewa_speech/train \
+  --validation_data_dir ./chichewa_speech/val \
+  --output_dir ./whisper-chichewa-finetuned
+license: bigscience-openrail-m
+datasets:
+- openai/gdpval
+language:
+- af
+metrics:
+- character
+base_model:
+- deepseek-ai/DeepSeek-OCR
+new_version: zai-org/GLM-4.6
+pipeline_tag: translation
+library_name: adapter-transformers
+tags:
+- art
+---
+GPT2from transformers import MarianTokenizer, MarianMTModel
+model = MarianMTModel.from_pretrained("your-org/marian-chichewa-en")
+tok = MarianTokenizer.from_pretrained("your-org/marian-chichewa-en")
+batch = tok(["Muli bwanji?"], return_tensors="pt", padding=True)
+out = model.generate(**batch)
+print(tok.batch_decode(out, skip_special_tokens=True))