Spaces:

Rogendo
/

Masked_Language_Modeling

Sleeping

App Files Files Community

Masked_Language_Modeling / README.md

bitz support

add huggingface space metadata

b6cd449 12 days ago

preview code

Raw

History Blame Contribute Delete

2.06 kB

	---
	title: AfriBERT Kenya MLM Compare
	emoji: 🤖
	colorFrom: green
	colorTo: blue
	sdk: gradio
	sdk_version: "5.50.0"
	python_version: "3.10"
	app_file: app.py
	pinned: false
	---

	# AfriBERT Kenya Masked LM Gradio App

	Gradio demo for comparing masked-language-modeling predictions from:

	- Base model: `castorini/afriberta_large`
	- Adapted model: `Rogendo/afribert-kenya-adapted`

	The app uses the same tokenizer, `castorini/afriberta_large`, for both models so the MLM predictions are directly comparable.

	The app supports Swahili, Sheng, Kenyan institutional text, M-PESA language, and English-Swahili code-switching examples.

	## Run locally

	PyTorch does not currently install on Python 3.14. Use Python 3.10 for this app.

	```bash
	cd /Users/bitzsupport/Desktop/Portfoliio/afribert-kenya-mlm-gradio
	python3.10 -m venv venv
	source venv/bin/activate
	python -m pip install --upgrade pip
	pip install -r requirements.txt
	export HF_TOKEN="your_huggingface_read_token"
	python app.py
	```

	If `python3.10` is not installed on macOS:

	```bash
	brew install python@3.10
	```

	If the model is public, `HF_TOKEN` is optional. If it is private, the token must have read access.

	Optional overrides:

	```bash
	export MODEL_ID="Rogendo/afribert-kenya-adapted"
	export ADAPTED_MODEL_ID="Rogendo/afribert-kenya-adapted"
	export BASE_MODEL_ID="castorini/afriberta_large"
	export TOKENIZER_ID="castorini/afriberta_large"
	```

	## Hugging Face Space

	Create a Gradio Space and upload:

	- `app.py`
	- `requirements.txt`
	- `README.md`
	- `runtime.txt`

	Then add a Space secret named `HF_TOKEN` with a Hugging Face token that can read the model.

	## Usage

	Use the tokenizer mask token shown in the app: `<mask>`. `[MASK]` is also accepted and automatically converted.

	Examples:

	```text
	Tulifanya meeting jana na manager akasema <mask> itakuwa ready wiki ijayo.
	```

	```text
	Msee alikuwa poa sana, akanisaidia kupata <mask> ya ofisi.
	```

	The first output table compares the base and adapted model rank-by-rank. The second table shows each model's completed sentence for every prediction.