Fine-Tuned MarianMT Model for French to English Translation

This model is a fine-tuned version of Helsinki-NLP/opus-mt-fr-en, trained on a mixture of Tatoeba, OpenSubtitle, Europarl, NewsCommentary, NewsDocumentary, and CCMatrix (4.5 million samples).

Live demo

Click here (press restart to run the space)

  • Then you input the french text that you want to translate into English.

  • Hit submit and the model will output the translation.

Performance and Evaluation

BLEU Scores (higher is better)

  • Purpose: Measures the precision of n-grams (word sequences) in the candidate translation compared to reference translations.
  • Interpretation: Higher BLEU scores indicate better word-level overlap with the reference.
  • Limitations: Does not account for synonyms or sentence structure.

Mix of every source includes Tatoeba, Europarl, NewsCommentary, OpenSubtitles, NewsDocumentary, and CCMatrix

Test Set / Model My Fine-Tuned Model facebook/mbart-large-50-many-to-many-mmt facebook/m2m100_418M
mix of every sources 69.38 68.08 25.16
newsdiscusstest2015-enfr 36.92 37.23 10.55
news-test2008 24.99 26.49 7.45
newstest2009 28.50 30.30 8.14
newstest2014-fren 34.72 37.79 8.37

ROUGE-L Scores (higher is better)

  • Purpose: Measures the longest common subsequence between the translations and the references.
  • Interpretation: Higher ROUGE-L scores indicate better preservation of meaning and content in the translation.
  • Limitations: May not fully capture fluency or grammatical correctness.
Test Set / Model My Fine-Tuned Model facebook/mbart-large-50-many-to-many-mmt facebook/m2m100_418M
mix of every sources 88.56 86.79 45.32
newsdiscusstest2015-enfr 64.40 63.48 33.24
news-test2008 53.33 53.85 25.64
newstest2009 56.44 57.49 25.65
newstest2014-fren 63.82 64.88 27.76

Usage

from transformers import MarianMTModel, MarianTokenizer
import torch

# Load model and tokenizer
model_path = "nambn0321/NMT_opus_fr_en"
model = MarianMTModel.from_pretrained(model_path)
tokenizer = MarianTokenizer.from_pretrained(model_path)

# Move model to appropriate device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

# Define a function to translate English to French
def translate(text):
    if not text.strip():
        return "Please enter some text."

    # Tokenize input text
    inputs = tokenizer([text], return_tensors="pt", padding=True, truncation=True).to(device)

    # Generate translation
    with torch.no_grad():
        outputs = model.generate(**inputs)

    # Decode the generated output
    translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return translation

NOM

Downloads last month
3
Safetensors
Model size
74.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nambn0321/NMT_opus_fr_en

Finetuned
(15)
this model

Spaces using nambn0321/NMT_opus_fr_en 3

Collection including nambn0321/NMT_opus_fr_en