File size: 2,710 Bytes
60a01fb
 
 
 
 
 
 
 
22e5d9c
60a01fb
 
 
 
 
 
db33072
5fcb146
 
 
 
 
 
 
4d21984
 
8c3c0fa
 
4d21984
 
5fcb146
 
4d21984
5fcb146
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5f00202
731a57f
296f8c1
731a57f
296f8c1
731a57f
5f00202
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
language: 
 - as


tags:
- Assamese pos tagger
- pos tagger for Assamese
- flair based pos tagging model

metrics:
- F1 score


---
# AsPOS: Pre-trained model for Assamese POS tagging
AsPOS is a pre-trained POS tagging model focusing on Assamese language. Stacked embedding (MuRIL + FlairEmbedding) and BiLSTM-CRF model are used to train the model. It achieves an F1-score of 74.62% in POS tagging with 41 POS tagset. 


## Annotated Assamese POS tagged dataset 

The dataset has been annotated by an automatic POS tagger, of which the accuracy is 74.62%. After that, it is manually corrected. The dataset is split into three parts for model training, those are train.txt, dev.txt, and test.txt.

## Requirements

 - It requires python 3.6+
 - Install [Flair](https://github.com/flairNLP/flair) (Version: 0.9.0) preferably in virtual environment,


## How to run

Download the pre-trained model from the link- [AsPOS](https://huggingface.co/dpathak/aspos_assamese_pos_tagger/blob/main/AsPOS.pt). 

```
from flair.models import SequenceTagger
from flair.data import  Sentence, Token

# Load the tagger

model = SequenceTagger.load('AsPOS.pt')

#  create example sentence
sen='ফুকন বসুমতাৰী এজন অধ্য়াপক । তেওঁ বৰ্তমান কোকৰাঝাৰত থাকে ।'
sentence = Sentence(sen)
# predict tags and print
model.predict(sentence)
print(sentence.to_tagged_string())
ফুকন <N_NNP> বসুমতাৰী <N_NN> এজন <QT_QTF> অধ্য়াপক <N_NN> । <RD_PUNC> তেওঁ <PR_PRP> বৰ্তমান <RB> 
কোকৰাঝাৰত <N_NNP> থাকে <V_VM> । <RD_PUNC>

#  create example sentence
sen='মাতৃভাষাৰ সমান্তৰালকৈ সংস্কৃত, ইংৰাজী ভাষাৰ চৰ্চা অত্যন্ত জৰুৰী ৷'
sentence = Sentence(sen)
# predict tags and print
model.predict(sentence)
print(sentence.to_tagged_string()
মাতৃভাষাৰ <N_NN> সমান্তৰালকৈ <N_NN> সংস্কৃত <N_NNP> , <RD_PUNC> ইংৰাজী <N_NNP> ভাষাৰ <N_ANN> চৰ্চা <N_NN> অত্যন্ত <RP_INTF> 
জৰুৰী <N_NN> ৷ <RD_PUNC>


```
-----

```
# If you use our model, please cite this paper:

@INPROCEEDINGS{10017934,
  author={Pathak, Dhrubajyoti and Nandi, Sukumar and Sarmah, Priyankoo},
  booktitle={2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)}, 
  title={AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach}, 
  year={2022},
  volume={},
  number={},
  pages={1-8},
  doi={10.1109/AICCSA56895.2022.10017934}}