This model has been trained on the data that was provided by turkish-nlp-suite/temiz-OSCAR and was later chunked into a smaller piece in order to lemmatize each and every word accurately. In total 300k words have been pulled from this dataset with some unfit for lemmatization or morpheme segmentation (such as non-spesifik, baba-oğul,