--- language: - as tags: - Assamese pos tagger - pos tagger for Assamese - flair based pos tagging model metrics: - F1 score --- # AsPOS: Pre-trained model for Assamese POS tagging AsPOS is a pre-trained POS tagging model focusing on Assamese language. Stacked embedding (MuRIL + FlairEmbedding) and BiLSTM-CRF model are used to train the model. It achieves an F1-score of 74.62% in POS tagging with 41 POS tagset. ## Annotated Assamese POS tagged dataset The dataset has been annotated by an automatic POS tagger, of which the accuracy is 74.62%. After that, it is manually corrected. The dataset is split into three parts for model training, those are train.txt, dev.txt, and test.txt. ## Requirements - It requires python 3.6+ - Install [Flair](https://github.com/flairNLP/flair) (Version: 0.9.0) preferably in virtual environment, ## How to run Download the pre-trained model from the link- [AsPOS](https://huggingface.co/dpathak/aspos_assamese_pos_tagger/blob/main/AsPOS.pt). ``` from flair.models import SequenceTagger from flair.data import Sentence, Token # Load the tagger model = SequenceTagger.load('AsPOS.pt') # create example sentence sen='ফুকন বসুমতাৰী এজন অধ্য়াপক । তেওঁ বৰ্তমান কোকৰাঝাৰত থাকে ।' sentence = Sentence(sen) # predict tags and print model.predict(sentence) print(sentence.to_tagged_string()) ফুকন বসুমতাৰী এজন অধ্য়াপক তেওঁ বৰ্তমান কোকৰাঝাৰত থাকে # create example sentence sen='মাতৃভাষাৰ সমান্তৰালকৈ সংস্কৃত, ইংৰাজী ভাষাৰ চৰ্চা অত্যন্ত জৰুৰী ৷' sentence = Sentence(sen) # predict tags and print model.predict(sentence) print(sentence.to_tagged_string() মাতৃভাষাৰ সমান্তৰালকৈ সংস্কৃত , ইংৰাজী ভাষাৰ চৰ্চা অত্যন্ত জৰুৰী ``` ----- ``` # If you use our model, please cite this paper: @INPROCEEDINGS{10017934, author={Pathak, Dhrubajyoti and Nandi, Sukumar and Sarmah, Priyankoo}, booktitle={2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)}, title={AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach}, year={2022}, volume={}, number={}, pages={1-8}, doi={10.1109/AICCSA56895.2022.10017934}}