Instructions to use IMJONEZZ/SlovenBERTcina with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use IMJONEZZ/SlovenBERTcina with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="IMJONEZZ/SlovenBERTcina")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("IMJONEZZ/SlovenBERTcina") model = AutoModelForMaskedLM.from_pretrained("IMJONEZZ/SlovenBERTcina") - Notebooks
- Google Colab
- Kaggle
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
#Slovak RoBERTA Masked Language Model
###83Mil Parameters in small model
Medium and Large models coming soon!
RoBERTA pretrained tokenizer vocab and merges included.
##Training params:
Dataset: 8GB Slovak Monolingual dataset including ParaCrawl (monolingual), OSCAR, and several gigs of my own findings and cleaning.
Preprocessing: Tokenized with a pretrained ByteLevelBPETokenizer trained on the same dataset. Uncased, with s, pad, /s, unk, and mask special tokens.
Evaluation results:
- Mnoho ľudí tu MASK
- žije.
- žijú.
- je.
- trpí.
- Ako sa MASK
- máte
- máš
- má
- hovorí
- Plážová sezóna pod Zoborom patrí medzi MASK obdobia.
- ročné
- najkrajšie
- najobľúbenejšie
- najnáročnejšie
- Mnoho ľudí tu MASK
Limitations: The current model is fairly small, although it works very well. This model is meant to be finetuned on downstream tasks e.g. Part-of-Speech tagging, Question Answering, anything in GLUE or SUPERGLUE.
Credit: If you use this or any of my models in research or professional work, please credit me - Christopher Brousseau in said work.
- Downloads last month
- 6