COMET-kiwi is a reference-free Quality Estimation (QE) model for the English-Thai language pair. It is based on the COMET-kiwi architecture and has been fine-tuned on the MEET-MR dataset to align with human judgments of translation quality.
Model Description
The model was fine-tuned on the MEET-MR dataset, comprising 2,142 English source sentences and their translations across 9 diverse domains. Fine-tuning COMET-kiwi on this specific language pair and dataset significantly improves its ability to capture Thai vocabulary, contextual nuances, and human preferences compared to the generic pretrained version.
This model is designed to estimate the quality of English-to-Thai machine translations without using reference translations. Given a source text and its translation, outputs a single score between 0 and 1 where 1 represents a perfect translation.
Paper
TBA
Usage
from comet import download_model, load_from_checkpoint
# Load the model (assuming you have the checkpoint file)
model_path = download_model("MEET-MR/COMET-Kiwi-MEET-MR")
model = load_from_checkpoint(model_path)
data = [
{
"src": "The premises of the mission shall be inviolable.",
"mt": "สถานที่ของภารกิจจะต้องไม่ถูกละเมิด",
"ref": "อาคารและสถานที่ของคณะผู้แทนจะถูกละเมิดมิได้"
},
{
"src": "A hydrating day & night cream.",
"mt": "ครีมน้ำในวันและคืน",
"ref": "ครีมให้ความชุ่มชื้นสำหรับกลางวันและกลางคืน"
}
]
model_output = model.predict(data, batch_size=8, gpus=1)
print(model_output)
Model tree for Chula-AI/COMET-Kiwi-MEET-MR
Dataset used to train Chula-AI/COMET-Kiwi-MEET-MR
Collection including Chula-AI/COMET-Kiwi-MEET-MR
Evaluation results
- mqm correlation on MEET-MRself-reported0.402
- rank correlation on MEET-MRself-reported0.415