File size: 2,058 Bytes
63539f2
 
b6cd449
63539f2
 
 
 
 
 
 
 
 
f1ee6d0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
title: AfriBERT Kenya MLM Compare
emoji: 🤖
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: "5.50.0"
python_version: "3.10"
app_file: app.py
pinned: false
---

# AfriBERT Kenya Masked LM Gradio App

Gradio demo for comparing masked-language-modeling predictions from:

- Base model: `castorini/afriberta_large`
- Adapted model: `Rogendo/afribert-kenya-adapted`

The app uses the same tokenizer, `castorini/afriberta_large`, for both models so the MLM predictions are directly comparable.

The app supports Swahili, Sheng, Kenyan institutional text, M-PESA language, and English-Swahili code-switching examples.

## Run locally

PyTorch does not currently install on Python 3.14. Use Python 3.10 for this app.

```bash
cd /Users/bitzsupport/Desktop/Portfoliio/afribert-kenya-mlm-gradio
python3.10 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
export HF_TOKEN="your_huggingface_read_token"
python app.py
```

If `python3.10` is not installed on macOS:

```bash
brew install python@3.10
```

If the model is public, `HF_TOKEN` is optional. If it is private, the token must have read access.

Optional overrides:

```bash
export MODEL_ID="Rogendo/afribert-kenya-adapted"
export ADAPTED_MODEL_ID="Rogendo/afribert-kenya-adapted"
export BASE_MODEL_ID="castorini/afriberta_large"
export TOKENIZER_ID="castorini/afriberta_large"
```

## Hugging Face Space

Create a Gradio Space and upload:

- `app.py`
- `requirements.txt`
- `README.md`
- `runtime.txt`

Then add a Space secret named `HF_TOKEN` with a Hugging Face token that can read the model.

## Usage

Use the tokenizer mask token shown in the app: `<mask>`. `[MASK]` is also accepted and automatically converted.

Examples:

```text
Tulifanya meeting jana na manager akasema <mask> itakuwa ready wiki ijayo.
```

```text
Msee alikuwa poa sana, akanisaidia kupata <mask> ya ofisi.
```

The first output table compares the base and adapted model rank-by-rank. The second table shows each model's completed sentence for every prediction.