File size: 6,573 Bytes
d83c55d
 
 
 
 
 
 
 
329005a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d83c55d
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
title: README
emoji: 
colorFrom: yellow
colorTo: yellow
sdk: static
pinned: false
---
# JengaAI 

**Open-source ML training and inference framework for African AI.**

Train multi-task NLP, Speech, and Vision models with a single YAML config — no code required. Built in Kenya, built for Africa.

---

## What We Build

JengaAI is a framework that lets researchers, engineers, and non-technical teams train production-grade machine learning models on African language data — and deploy them without vendor lock-in, without API dependencies, and without sending sensitive data to foreign servers.

> *Your model. Your data. Your task.*

---

## Models

| Model | Task | Language | Base |
|-------|------|----------|------|
| [Rogendo/afribert-kenya-adapted](https://huggingface.co/Rogendo/afribert-kenya-adapted) | Masked Language Modeling (DAPT) | Swahili · Sheng · English | castorini/afriberta_large |
| [Rogendo/cpims-nlp-intent-urgency](https://huggingface.co/Rogendo/cpims-nlp-intent-urgency) | Intent + Urgency Classification | Swahili · Sheng · English | afribert-kenya-adapted |

### afribert-kenya-adapted
Domain-adaptive pre-training of AfriBERT on ~39M tokens of Kenyan language data — Swahili Wikipedia, East African journalism, synthetic Sheng/code-switch corpus, and real CPIMS field worker WhatsApp data. Achieves **30.4% average perplexity improvement** over the base model on Kenyan domain text, with **66% improvement on Sheng** and **41% on English-Swahili code-switching**.

### cpims-nlp-intent-urgency
Multi-task classifier trained on CPIMS child protection support messages. Simultaneously predicts **63 intent classes** and **urgency level** (high / medium / low) from a single encoder pass. Intent F1: **74.5%** — up from 46% on a generic English base model. Handles English, Swahili, and Kenyan code-switching.


With this framework, the possibilites of languange and Natral language processeng are limitless! 
---

## The Framework

```bash
pip install jenga-ai
```

Train any model with a single YAML config:

```yaml
project_name: swahili-hate-speech

model:
  base_model: castorini/afriberta_large
  max_seq_len: 128

tasks:
  - name: classification
    type: single_label_classification
    data_path: data/hate_speech.csv
    text_column: text
    label_column: label

training:
  epochs: 5
  batch_size: 16
  learning_rate: 3.0e-5
```

```bash
python -m jenga_ai train --config swahili-hate-speech.yaml
```

### Supported modalities

| Modality | Status | Notes |
|----------|--------|-------|
| NLP — classification, NER, multi-task | ✅ Production | Multi-task with shared encoder + dual heads |
| Speech — Whisper fine-tuning, transcription | ⚙️ Active development | ASR for Swahili and African languages |
| Vision — classification, OCR, object detection | ⚙️ Active development | Document verification, image classification |
| LLM — LoRA fine-tuning, Ollama integration | ⚙️ Active development | Swahili instruction tuning |

### Key capabilities

- **Multi-task learning** — one encoder, multiple task heads, shared representations
- **Domain adaptation** — continued MLM pre-training for African language domains
- **Responsible AI built in** — explainability engine, audit trail, human-in-the-loop routing, bias evaluation
- **Offline-first** — trained models run without internet, no per-query API cost
- **HuggingFace native** — load any HF model as base, push trained models to Hub
- **No-code web platform** — upload CSV, click Train, get predictions

---

## Why JengaAI Exists

Africa's AI ecosystem is being built on API wrappers — products that call GPT-4 or Claude and rebrand the output as "African AI." These products are expensive at scale, dependent on foreign infrastructure, unable to handle African languages properly, and unable to keep sensitive data on the continent.

JengaAI exists to make the alternative practical.

A locally trained, domain-adapted model:
- Costs nothing at inference time after training
- Runs fully offline in low-connectivity environments
- Can be fine-tuned on your specific institutional language
- Keeps sensitive data — health records, case notes, financial transactions — under your control
- Performs significantly better on African languages than generic multilingual models

---

## Use Cases

**Child protection systems** — intent classification and urgency triage for CPIMS support messages in English, Swahili, and Sheng

**Community health** — symptom extraction and referral urgency from CHW voice notes and field reports

**Financial services** — M-PESA dispute classification, fraud signal detection, transaction intent analysis

**Government services** — citizen complaint routing, document OCR, service request classification

**Education** — student question routing, learner sentiment analysis, multilingual content classification

**Media monitoring** — hate speech detection, misinformation flagging, topic classification in Swahili and code-switched text

---

## Responsible AI

JengaAI is built with responsible AI development as a core principle, not an afterthought:

- **Explainability** — every prediction can be explained in human-readable terms
- **Audit trails** — hash-chained tamper-evident logging of all inference decisions
- **Human-in-the-loop** — low-confidence predictions are flagged for human review
- **Bias evaluation** — evaluation across language, demographic, and domain subgroups
- **Data sovereignty** — models run locally, data never leaves your infrastructure
- **Transparent limitations** — model cards document what the model gets wrong, not just what it gets right

---

## Community

JengaAI is developed in the spirit of African AI communities doing the work right — [Data Science Africa](https://www.datascienceafrica.org/), [Masakhane](https://www.masakhane.io/), [Deep Learning Indaba](https://deeplearningindaba.com/), and [AIMS](https://nexteinstein.org/).

We believe that building AI for Africa means building it on African data, in African languages, with African institutional contexts — not wrapping foreign models in local branding.

---

## Links

- 🐙 **GitHub**: [github.com/Rogendo/JengaAI](https://github.com/Rogendo/JengaAI)
- 📦 **Framework**: `pip install jenga-ai`
- 📄 **Docs**: Coming soon
- 🤝 **Contribute**: Open to contributors — researchers, engineers, domain experts, annotators

---

*Built in Kenya 🇰🇪 — for Africa and beyond.*

Edit this `README.md` markdown file to author your organization card.