Update README

Pre-trained language models have been dominating the field of natural language processing in recent years, and have led to significant performance gains for various complex natural language tasks. One of the most prominent pre-trained language models is ROBERTa, which was released as an English as well as a multilingual version. This paper describes the process of developing and training a model to solve a punctuation problem in the Kyrgyz language. To achieve this goal, a multilingual pre-trained XML-ROBERTa-base model was used. This model was chosen because it gave the best results when trained in Cyrillic. Training and test datasets have been produced for the Kyrgyz language with no previous studies from books, Wikipedia and websites. For English, results comparable to the state-of-the-art have been obtained. One feature of this model is that the ROBERTa-base model that is part of it is a transformer model rather than a model specialised for punctuation tasks. Therefore, this model partially understands what the text is about, but because the dataset is in a different format, it only outputs correctly placed punctuation

Files changed (1) hide show

README.md +8 -1

README.md CHANGED Viewed

	@@ -1 +1,8 @@
1	- ~~# hackaton_generative_ai~~

+---
+license: apache-2.0
+language:
+- ky
+metrics:
+- accuracy
+---
+# hackaton_generative_ai