Bengali Named Entity Recognition
Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Bengali language.
Label ID and its corresponding label name
| Label ID |
Label Name |
| 0 |
O |
| 1 |
B-PER |
| 2 |
I-PER |
| 3 |
B-ORG |
| 4 |
I-ORG |
| 5 |
B-LOC |
| 6 |
I-LOC |
Results
| Name |
Overall F1 |
LOC F1 |
ORG F1 |
PER F1 |
| Train set |
0.997927 |
0.998246 |
0.996613 |
0.998769 |
| Validation set |
0.970187 |
0.969212 |
0.956831 |
0.982079 |
| Test set |
0.9673011 |
0.967120 |
0.963614 |
0.970938 |
Example
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("Suchandra/bengali_language_NER")
model = AutoModelForTokenClassification.from_pretrained("Suchandra/bengali_language_NER")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "মারভিন দি মারসিয়ান"
ner_results = nlp(example)
ner_results