File size: 2,710 Bytes
60a01fb 22e5d9c 60a01fb db33072 5fcb146 4d21984 8c3c0fa 4d21984 5fcb146 4d21984 5fcb146 5f00202 731a57f 296f8c1 731a57f 296f8c1 731a57f 5f00202 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
language:
- as
tags:
- Assamese pos tagger
- pos tagger for Assamese
- flair based pos tagging model
metrics:
- F1 score
---
# AsPOS: Pre-trained model for Assamese POS tagging
AsPOS is a pre-trained POS tagging model focusing on Assamese language. Stacked embedding (MuRIL + FlairEmbedding) and BiLSTM-CRF model are used to train the model. It achieves an F1-score of 74.62% in POS tagging with 41 POS tagset.
## Annotated Assamese POS tagged dataset
The dataset has been annotated by an automatic POS tagger, of which the accuracy is 74.62%. After that, it is manually corrected. The dataset is split into three parts for model training, those are train.txt, dev.txt, and test.txt.
## Requirements
- It requires python 3.6+
- Install [Flair](https://github.com/flairNLP/flair) (Version: 0.9.0) preferably in virtual environment,
## How to run
Download the pre-trained model from the link- [AsPOS](https://huggingface.co/dpathak/aspos_assamese_pos_tagger/blob/main/AsPOS.pt).
```
from flair.models import SequenceTagger
from flair.data import Sentence, Token
# Load the tagger
model = SequenceTagger.load('AsPOS.pt')
# create example sentence
sen='ফুকন বসুমতাৰী এজন অধ্য়াপক । তেওঁ বৰ্তমান কোকৰাঝাৰত থাকে ।'
sentence = Sentence(sen)
# predict tags and print
model.predict(sentence)
print(sentence.to_tagged_string())
ফুকন <N_NNP> বসুমতাৰী <N_NN> এজন <QT_QTF> অধ্য়াপক <N_NN> । <RD_PUNC> তেওঁ <PR_PRP> বৰ্তমান <RB>
কোকৰাঝাৰত <N_NNP> থাকে <V_VM> । <RD_PUNC>
# create example sentence
sen='মাতৃভাষাৰ সমান্তৰালকৈ সংস্কৃত, ইংৰাজী ভাষাৰ চৰ্চা অত্যন্ত জৰুৰী ৷'
sentence = Sentence(sen)
# predict tags and print
model.predict(sentence)
print(sentence.to_tagged_string()
মাতৃভাষাৰ <N_NN> সমান্তৰালকৈ <N_NN> সংস্কৃত <N_NNP> , <RD_PUNC> ইংৰাজী <N_NNP> ভাষাৰ <N_ANN> চৰ্চা <N_NN> অত্যন্ত <RP_INTF>
জৰুৰী <N_NN> ৷ <RD_PUNC>
```
-----
```
# If you use our model, please cite this paper:
@INPROCEEDINGS{10017934,
author={Pathak, Dhrubajyoti and Nandi, Sukumar and Sarmah, Priyankoo},
booktitle={2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)},
title={AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach},
year={2022},
volume={},
number={},
pages={1-8},
doi={10.1109/AICCSA56895.2022.10017934}}
|