Update README.md

f295390 verified 7 days ago

1.67 kB

language:
  - en
  - be
tags:
  - translation
  - pytorch
  - transformers
  - marian
pipeline_tag: translation
datasets:
  - Helsinki-NLP/opus-100
base_model: Helsinki-NLP/opus-mt-en-mul
metrics:
  - bleu

English to Belarusian Translator (en-be)

This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-mul for translating text from English (en) to Belarusian (be).

Model Description

The model was fine-tuned using the transformers library on the English–Belarusian split of the OPUS-100 dataset. It is based on the MarianMT architecture and is optimized for translating short and medium-length sentences from English into Belarusian.

Example of usage

You can use this model directly with the transformers library:

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer
model_name = "Aleton/en-be-translator"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()

# Text to translate
text = "Hello, how are you?"

inputs = tokenizer(
    text,
    return_tensors="pt",
    truncation=True,
    max_length=128,
).to(device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=128,
        num_beams=4,
    )

translation = tokenizer.decode(
    outputs[0],
    skip_special_tokens=True,
)

print(translation)
# Example output: Прывітанне, як справы?