Ambareeshkumar commited on
Commit
973a60b
·
1 Parent(s): a918fca

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: bn
3
+ datasets:
4
+ - wikiann
5
+ examples:
6
+ widget:
7
+ - text: "মারভিন দি মারসিয়ান"
8
+ example_title: "Sentence_1"
9
+ - text: "লিওনার্দো দা ভিঞ্চি"
10
+ example_title: "Sentence_2"
11
+ - text: "বসনিয়া ও হার্জেগোভিনা"
12
+ example_title: "Sentence_3"
13
+ - text: "সাউথ ইস্ট ইউনিভার্সিটি"
14
+ example_title: "Sentence_4"
15
+ - text: "মানিক বন্দ্যোপাধ্যায় লেখক"
16
+ example_title: "Sentence_5"
17
+ ---
18
+
19
+ <h1>Bengali Named Entity Recognition</h1>
20
+ Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Bengali language.
21
+
22
+
23
+ ## Label ID and its corresponding label name
24
+
25
+ | Label ID | Label Name|
26
+ | -------- | ----- |
27
+ |0 | O |
28
+ | 1 | B-PER |
29
+ | 2 | I-PER |
30
+ | 3 | B-ORG|
31
+ | 4 | I-ORG |
32
+ | 5 | B-LOC |
33
+ | 6 | I-LOC |
34
+
35
+ <h1>Results</h1>
36
+
37
+ | Name | Overall F1 | LOC F1 | ORG F1 | PER F1 |
38
+ | ---- | -------- | ----- | ---- | ---- |
39
+ | Train set | 0.997927 | 0.998246 | 0.996613 | 0.998769 |
40
+ | Validation set | 0.970187 | 0.969212 | 0.956831 | 0.982079 |
41
+ | Test set | 0.9673011 | 0.967120 | 0.963614 | 0.970938 |
42
+
43
+ Example
44
+ ```py
45
+ from transformers import AutoTokenizer, AutoModelForTokenClassification
46
+ from transformers import pipeline
47
+ tokenizer = AutoTokenizer.from_pretrained("Suchandra/bengali_language_NER")
48
+ model = AutoModelForTokenClassification.from_pretrained("Suchandra/bengali_language_NER")
49
+ nlp = pipeline("ner", model=model, tokenizer=tokenizer)
50
+ example = "মারভিন দি মারসিয়ান"
51
+ ner_results = nlp(example)
52
+ ner_results
53
+ ```