This project is an open-source AI-powered translation system designed to make communication easier between English and Malawian local languages. It uses modern machine learning and natural language processing (NLP) models to translate text and speech accurately across languages spoken in Malawi. --- Supported Languages - The system currently supports: Chichewa (Nyanja) Chitumbuka Chiyao Chilomwe Chisena Chitonga - Additional languages and dialects can be added as data becomes available. Project Goals - Break language barriers in Malawi through accessible AI tools. - Support communication in education, health, agriculture, and government. - Preserve and promote Malawian indigenous languages in digital technology. - Provide open datasets and models for researchers and developers. Features - Text translation: English Local languages - Speech recognition: Convert spoken language to text - Text-to-speech: Speak translated text naturally - Chat integration: Support for WhatsApp and web interfaces - Offline capability: Small models for mobile and rural use.

#1
by Ezek3121 - opened
Files changed (1) hide show
  1. README.md +47 -1
README.md CHANGED
@@ -1 +1,47 @@
1
- GPT2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: bigscience-openrail-m
3
+ datasets:
4
+ - openai/gdpval
5
+ language:
6
+ - af
7
+ metrics:
8
+ - character
9
+ base_model:
10
+ - sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
11
+ new_version: deepseek-ai/DeepSeek-OCR
12
+ pipeline_tag: voice-activity-detection
13
+ library_name: diffusers
14
+ tags:
15
+ - art
16
+ - agent
17
+ ---
18
+ ---# install
19
+ pip install transformers datasets accelerate
20
+
21
+ # training (very simplified)
22
+ python run_speech_to_text_flax.py \
23
+ --model_name_or_path openai/whisper-small \
24
+ --train_data_dir ./chichewa_speech/train \
25
+ --validation_data_dir ./chichewa_speech/val \
26
+ --output_dir ./whisper-chichewa-finetuned
27
+ license: bigscience-openrail-m
28
+ datasets:
29
+ - openai/gdpval
30
+ language:
31
+ - af
32
+ metrics:
33
+ - character
34
+ base_model:
35
+ - deepseek-ai/DeepSeek-OCR
36
+ new_version: zai-org/GLM-4.6
37
+ pipeline_tag: translation
38
+ library_name: adapter-transformers
39
+ tags:
40
+ - art
41
+ ---
42
+ GPT2from transformers import MarianTokenizer, MarianMTModel
43
+ model = MarianMTModel.from_pretrained("your-org/marian-chichewa-en")
44
+ tok = MarianTokenizer.from_pretrained("your-org/marian-chichewa-en")
45
+ batch = tok(["Muli bwanji?"], return_tensors="pt", padding=True)
46
+ out = model.generate(**batch)
47
+ print(tok.batch_decode(out, skip_special_tokens=True))