en-be-translator / README.md
Aleton's picture
Update README.md
f295390 verified
|
Raw
History Blame Contribute Delete
1.67 kB
---
language:
- en
- be
tags:
- translation
- pytorch
- transformers
- marian
pipeline_tag: translation
datasets:
- Helsinki-NLP/opus-100
base_model: Helsinki-NLP/opus-mt-en-mul
metrics:
- bleu
---
# English to Belarusian Translator (en-be)
This model is a fine-tuned version of [Helsinki-NLP/opus-mt-en-mul](https://huggingface.co/Helsinki-NLP/opus-mt-en-mul) for translating text from **English (en)** to **Belarusian (be)**.
## Model Description
The model was fine-tuned using the `transformers` library on the English–Belarusian split of the [OPUS-100 dataset](https://huggingface.co/datasets/Helsinki-NLP/opus-100). It is based on the MarianMT architecture and is optimized for translating short and medium-length sentences from English into Belarusian.
## Example of usage
You can use this model directly with the `transformers` library:
```python
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load model and tokenizer
model_name = "Aleton/en-be-translator"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()
# Text to translate
text = "Hello, how are you?"
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
max_length=128,
).to(device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=128,
num_beams=4,
)
translation = tokenizer.decode(
outputs[0],
skip_special_tokens=True,
)
print(translation)
# Example output: Прывітанне, як справы?
```