JuaKazi Multilingual Bias Corrector v1

Seq2seq gender bias correction model covering 6 languages. Fine-tuned from castorini/afriteva_v2_base on ~10K correction pairs.

Usage

Input format: correct bias {lang}: {biased sentence}

Where lang is one of: sw, ha, zu, ki, fr, en

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("juakazike/multilingual-bias-corrector-v1")
model = AutoModelForSeq2SeqLM.from_pretrained("juakazike/multilingual-bias-corrector-v1")

def correct(text, lang):
    prompt = f"correct bias {lang}: {text}"
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=128)
    out = model.generate(**inputs, max_new_tokens=128, num_beams=4)
    return tokenizer.decode(out[0], skip_special_tokens=True)

correct("The chairman will lead the board meeting.", "en")
# -> "The chair will lead the board meeting."

Validation BLEU (val set, 10% held out per language)

Language Pairs BLEU
Swahili (sw) 1,586 17.7
Hausa (ha) 1,917 4.1
Zulu (zu) 1,931 0.6
Gikuyu (ki) 867 4.0
French (fr) 636 30.8
English (en) 3,464 38.6
Downloads last month
27
Safetensors
Model size
0.7B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support