Twi–English Translation Quality Estimation

This model predicts whether a Twi–English translation pair is correct (1) or incorrect (0). It is based on intfloat/multilingual-e5-small and fine‑tuned on a synthetic dataset of word and sentence pairs.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "ghananlpcommunity/twi-eng-qe-e5"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict(twi, english):
    text = f"query: {twi} passage: {english}"
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
    outputs = model(**inputs)
    prob = outputs.logits.softmax(dim=-1)
    return prob[0][1].item()   # probability of "correct"

print(predict("me ho ye", "I am fine"))        # expected > 0.5
print(predict("me ho ye", "The car is blue"))  # expected < 0.5

Training Details

  • Base model: intfloat/multilingual-e5-small
  • Data: Synthetic parallel corpus (Twi–English) of 172240 examples
  • Test Accuracy: 0.885624709707385
  • Test F1: 0.8986938187802118
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using ghananlpcommunity/twi-eng-qe-e5 1

Collection including ghananlpcommunity/twi-eng-qe-e5