Instructions to use Helsinki-NLP/opus-mt-pl-en with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Helsinki-NLP/opus-mt-pl-en with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-pl-en")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-pl-en") model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-pl-en") - Inference
- Notebooks
- Google Colab
- Kaggle
Model stops translating on encountering “-” character
I’m trying to translate simple text from Polish to English:
“Życie nigdy się nie kończy – przygotuj się zatem na ciąg dalszy. Zasilany twoją energią zegarek z widocznym mechanizmem Mads Dante dopasuje się do ciebie, tempo do tempa. Zrób dziś to, czego inni nie zrobią. Dzięki temu jutro będziesz mógł zrobić to, czego inni nie mogą.”
The model behaves strangely, when it encounters the - it stops translating only returning the translation of what precedes the - char.
When I move this char the translation always ends before it.
After further investigation, the model returns the generated ids: tensor([[63429, 7157, 522, 10126, 15, 0]])
when decoded: ' Life never ends '.
Surprisingly, when I use num_beams set to 2 instead of 1 I get a good result. The problem is that because of time constraints I can't use num_beams=2
Does anyone know what is happening?