macedonizer
/

sl-gpt2

@@ -1,21 +1,20 @@
 ---
 language:
-- mk
-thumbnail: https://huggingface.co/macedonizer/mk-roberta-base/blaze-koneski.jpg
 license: Apache 2.0
 datasets:
-- wiki-mk
-- time-mk-news-2010-2015
 ---
-# mk-gpt2
 Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large
 Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in
 [this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
 and first released at [this page](https://openai.com/blog/better-language-models/).
 ## Model description
-mk-gpt2 is a transformers model pretrained on a very large corpus of Macedonian data in a self-supervised fashion. This
 means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots
 of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely,
 it was trained to guess the next word in sentences.
@@ -32,10 +31,10 @@ Here is how to use this model to get the features of a given text in PyTorch:
 import random
 from transformers import AutoTokenizer, AutoModelWithLMHead
-tokenizer = AutoTokenizer.from_pretrained('macedonizer/mk-gpt2') \
-model = AutoModelWithLMHead.from_pretrained('macedonizer/mk-gpt2')
-input_text = 'Скопје е '
 if len(input_text) == 0: \
     encoded_input = tokenizer(input_text, return_tensors="pt") \
@@ -59,8 +58,7 @@ else: \
         num_return_sequences=1, \
     )
-decoded_output = [] \
-for sample in output: \
     decoded_output.append(tokenizer.decode(sample, skip_special_tokens=True))
 print(decoded_output)

 ---
 language:
+- sl
+thumbnail: https://huggingface.co/macedonizer/mkgpt2/lets-talk-about-nlp.jpg
 license: Apache 2.0
 datasets:
+- wiki-sl
 ---
+# sl-gpt2
 Test the whole generation capabilities here: https://transformer.huggingface.co/doc/gpt2-large
 Pretrained model on English language using a causal language modeling (CLM) objective. It was introduced in
 [this paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)
 and first released at [this page](https://openai.com/blog/better-language-models/).
 ## Model description
+sl-gpt2 is a transformers model pretrained on a very large corpus of Slovenian data in a self-supervised fashion. This
 means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots
 of publicly available data) with an automatic process to generate inputs and labels from those texts. More precisely,
 it was trained to guess the next word in sentences.
 import random
 from transformers import AutoTokenizer, AutoModelWithLMHead
+tokenizer = AutoTokenizer.from_pretrained('macedonizer/sl-gpt2') \
+model = AutoModelWithLMHead.from_pretrained('macedonizer/sl-gpt2')
+input_text = 'Ljubljana '
 if len(input_text) == 0: \
     encoded_input = tokenizer(input_text, return_tensors="pt") \
         num_return_sequences=1, \
     )
+decoded_output = [] \\nfor sample in output: \
     decoded_output.append(tokenizer.decode(sample, skip_special_tokens=True))
 print(decoded_output)