| datasets: | |
| - leipzig | |
| language: | |
| - hr | |
| - sr | |
| - multilingual | |
| tags: | |
| - masked-lm | |
| widget: | |
| - text: "Gde je <mask>." | |
| license: apache-2.0 | |
| # Transformer language model for Croatian and Serbian | |
| Trained on 0.7GB dataset Croatian and Serbian language for one epoch. | |
| Dataset from Leipzig Corpora. | |
| # Information of dataset | |
| | Model | #params | Arch. | Training data | | |
| |--------------------------------|--------------------------------|-------|-----------------------------------| | |
| | `Andrija/SRoBERTa` | 120M | First | Leipzig Corpus (0.7 GB of text) | | |
| # How to use in code | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForMaskedLM | |
| tokenizer = AutoTokenizer.from_pretrained("Andrija/SRoBERTa") | |
| model = AutoModelForMaskedLM.from_pretrained("Andrija/SRoBERTa") | |
| ``` |