Twi–English Translation Quality Estimation

This model predicts whether a Twi–English translation pair is correct (1) or incorrect (0). It is based on intfloat/multilingual-e5-small and fine‑tuned on a synthetic dataset of word and sentence pairs.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "ghananlpcommunity/twi-eng-qe-e5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict(twi, english):
    text = f"query: {twi} passage: {english}"
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
    outputs = model(**inputs)
    prob = outputs.logits.softmax(dim=-1)
    return prob[0][1].item()   # probability of "correct"

print(predict("me ho ye", "I am fine"))        # expected > 0.5
print(predict("me ho ye", "The car is blue"))  # expected < 0.5

Training Details

Base model: intfloat/multilingual-e5-small
Data: Human source parallel corpus (Twi–English) of 230217 examples
Test Accuracy: 0.9052210928676918
Test F1: 0.9152752970412363

Downloads last month: 18

Safetensors

Model size

0.1B params

Tensor type

F32

Space using ghananlpcommunity/twi-eng-qe-e5 1

Collection including ghananlpcommunity/twi-eng-qe-e5

MT Accuracy Evaluation

Collection

Models trained on human translated datasets to help evaluate the accuracy of machine translated texts. • 1 item • Updated May 26