oscar-corpus/oscar
Updated • 703 • 207
How to use TUKE-KEMT/slavic-t5-base with Transformers:
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("TUKE-KEMT/slavic-t5-base")
model = AutoModelForSeq2SeqLM.from_pretrained("TUKE-KEMT/slavic-t5-base")Aim of this model is to reach the best results for the Slavic laguages with Latin script.
It is suitable for tasks such as:
The model is trained on the selected parts of OSCAR corpus and MaCoCu corpus.
It supports this languages: Czech, Croatian, Polish , Slovak, Slovenian,
Vocabulary has 120 000 tokens, contains capital letters.