Ambareeshkumar
/

BERT-Tamil

@@ -1,23 +1,24 @@
 ---
-language: bn
 datasets:
 - wikiann
 examples:
 widget:
-- text: "মারভিন দি মারসিয়ান"
   example_title: "Sentence_1"
-- text: "লিওনার্দো দা ভিঞ্চি"
   example_title: "Sentence_2"
-- text: "বসনিয়া ও হার্জেগোভিনা"
   example_title: "Sentence_3"
-- text: "সাউথ ইস্ট ইউনিভার্সিটি"
   example_title: "Sentence_4"
-- text: "মানিক বন্দ্যোপাধ্যায় লেখক"
   example_title: "Sentence_5"
 ---
-<h1>Bengali Named Entity Recognition</h1>
-Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Bengali language.
 ## Label ID and its corresponding label name
@@ -34,20 +35,22 @@ Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER o
 <h1>Results</h1>
-| Name | Overall F1 | LOC F1 | ORG F1 | PER F1 |
-| ---- | -------- | ----- | ---- | ---- |
-| Train set | 0.997927 | 0.998246 | 0.996613 | 0.998769 |
-| Validation set | 0.970187 | 0.969212 | 0.956831 | 0.982079 |
-| Test set | 0.9673011 | 0.967120 |  0.963614 | 0.970938 |
 Example
 ```py
 from transformers import AutoTokenizer, AutoModelForTokenClassification
 from transformers import pipeline
-tokenizer = AutoTokenizer.from_pretrained("Suchandra/bengali_language_NER")
-model = AutoModelForTokenClassification.from_pretrained("Suchandra/bengali_language_NER")
 nlp = pipeline("ner", model=model, tokenizer=tokenizer)
-example = "মারভিন দি মারসিয়ান"
 ner_results = nlp(example)
 ner_results
 ```

 ---
+language: ta
 datasets:
 - wikiann
 examples:
 widget:
+- text: "இந்திய"
   example_title: "Sentence_1"
+- text: "இந்தியா வளர்ந்து வரும் வல்லரசு"
   example_title: "Sentence_2"
+- text: "2050ல் இந்தியா உலகின் மிகப்பெரிய பொருளாதார நாடாக மாறும்"
   example_title: "Sentence_3"
+- text: "உலக அரங்கில் ரஷ்யா - உக்ரைன் மோதலில் இந்தியாவின் நிலைப்பாட்டை வெளியுறவு அமைச்சர் தெளிவாக எடுத்துரைத்துள்ளார்."
   example_title: "Sentence_4"
+- text: "ஜி20 நாடுகளின் தலைவர் பதவி இந்திய பிரதமர் நரேந்திர மோடியிடம் ஒப்படைக்கப்பட்டுள்ளது"
   example_title: "Sentence_5"
+- text: "ரஷ்யாவிடம் இருந்து எண்ணெய் வாங்க வேண்டாம் என ஐரோப்பிய நாடுகளுக்கு ஐரோப்பிய ஒன்றியம் அறிவுறுத்தியுள்ளது"
 ---
+<h1>Tamil Named Entity Recognition</h1>
+Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Tamil language.
 ## Label ID and its corresponding label name
 <h1>Results</h1>
+Step	Training Loss	Validation Loss	Overall Precision	Overall Recall	Overall F1	Overall Accuracy	Loc F1	  Org F1	 Per F1
+1000	0.386900	        0.300006	   0.833469	           0.824748	     0.829086	   0.912857	       0.835343	 0.781625	0.867752
+2000	0.210200	        0.251389	   0.845455	           0.842052	     0.843750	   0.924861	       0.851711	 0.790198	0.886515
+3000	0.140000	        0.264964	   0.866952	           0.856137	     0.861510	   0.930141	       0.874872	 0.818150	0.885203
+4000	0.095400	        0.298542	   0.860871	           0.882696   	 0.871647	   0.935692	       0.881348	 0.829285	0.899245
+5000	0.062200	        0.296011	   0.871805	           0.878471	     0.875125	   0.938806	       0.875434	 0.850889	0.898148
+6000	0.042200            0.320418	   0.868416	           0.879074	     0.873713	   0.937497	       0.877588	 0.833611	0.907737
 Example
 ```py
 from transformers import AutoTokenizer, AutoModelForTokenClassification
 from transformers import pipeline
+tokenizer = AutoTokenizer.from_pretrained("Ambareeshkumar/fine_tune_bert_output")
+model = AutoModelForTokenClassification.from_pretrained("Ambareeshkumar/fine_tune_bert_output")
 nlp = pipeline("ner", model=model, tokenizer=tokenizer)
+example = "இந்திய"
 ner_results = nlp(example)
 ner_results
 ```