--- title: AfriBERT Kenya MLM Compare emoji: 🤖 colorFrom: green colorTo: blue sdk: gradio sdk_version: "5.50.0" python_version: "3.10" app_file: app.py pinned: false --- # AfriBERT Kenya Masked LM Gradio App Gradio demo for comparing masked-language-modeling predictions from: - Base model: `castorini/afriberta_large` - Adapted model: `Rogendo/afribert-kenya-adapted` The app uses the same tokenizer, `castorini/afriberta_large`, for both models so the MLM predictions are directly comparable. The app supports Swahili, Sheng, Kenyan institutional text, M-PESA language, and English-Swahili code-switching examples. ## Run locally PyTorch does not currently install on Python 3.14. Use Python 3.10 for this app. ```bash cd /Users/bitzsupport/Desktop/Portfoliio/afribert-kenya-mlm-gradio python3.10 -m venv venv source venv/bin/activate python -m pip install --upgrade pip pip install -r requirements.txt export HF_TOKEN="your_huggingface_read_token" python app.py ``` If `python3.10` is not installed on macOS: ```bash brew install python@3.10 ``` If the model is public, `HF_TOKEN` is optional. If it is private, the token must have read access. Optional overrides: ```bash export MODEL_ID="Rogendo/afribert-kenya-adapted" export ADAPTED_MODEL_ID="Rogendo/afribert-kenya-adapted" export BASE_MODEL_ID="castorini/afriberta_large" export TOKENIZER_ID="castorini/afriberta_large" ``` ## Hugging Face Space Create a Gradio Space and upload: - `app.py` - `requirements.txt` - `README.md` - `runtime.txt` Then add a Space secret named `HF_TOKEN` with a Hugging Face token that can read the model. ## Usage Use the tokenizer mask token shown in the app: ``. `[MASK]` is also accepted and automatically converted. Examples: ```text Tulifanya meeting jana na manager akasema itakuwa ready wiki ijayo. ``` ```text Msee alikuwa poa sana, akanisaidia kupata ya ofisi. ``` The first output table compares the base and adapted model rank-by-rank. The second table shows each model's completed sentence for every prediction.