prachuryyaIITG
/

MultiCoNER2_Hindi_XLM

Token Classification

Named_Entity_Recognition

Model card Files Files and versions

prachuryyaIITG commited on Jan 9

Commit

85018a5

·

verified ·

1 Parent(s): e78dad4

Update README.md

Files changed (1) hide show

README.md +64 -3

README.md CHANGED Viewed

@@ -1,3 +1,64 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- MultiCoNER/multiconer_v2
+language:
+- hi
+metrics:
+- f1
+- precision
+- recall
+base_model:
+- FacebookAI/xlm-roberta-large
+pipeline_tag: token-classification
+tags:
+- NER
+- Named_Entity_Recognition
+pretty_name: MultiCoNER2 Hindi XLM-RoBERTa
+---
+**XLM-RoBERTa is fine-tuned on Hindi [MultiCoNER2](https://huggingface.co/datasets/MultiCoNER/multiconer_v2) dataset for Fine-grained Named Entity Recognition.**
+The tagset of [MultiCoNER2](https://huggingface.co/datasets/MultiCoNER/multiconer_v2) is a fine-grained tagset. The fine to coarse level mapping of the tags are as follows:
+  * Location (LOC) : Facility, OtherLOC, HumanSettlement, Station
+  * Creative Work (CW) : VisualWork, MusicalWork, WrittenWork, ArtWork, Software
+  * Group (GRP) : MusicalGRP, PublicCORP, PrivateCORP, AerospaceManufacturer, SportsGRP, CarManufacturer, ORG
+  * Person (PER) : Scientist, Artist, Athlete, Politician, Cleric, SportsManager, OtherPER
+  * Product (PROD) : Clothing, Vehicle, Food, Drink, OtherPROD
+  * Medical (MED) : Medication/Vaccine, MedicalProcedure, AnatomicalStructure, Symptom, Disease
+## Model performance:
+Precision: 76.07 <br>
+Recall: 79.42 <br>
+**F1: 77.71** <br>
+## Training Parameters:
+Epochs: 6 <br>
+Optimizer: AdamW <br>
+Learning Rate: 5e-5 <br>
+Weight Decay: 0.01 <br>
+Batch Size: 64 <br>
+## Citation
+If you use this model, please cite the following papers:
+```bibtex
+@inproceedings{fetahu2023multiconer,
+  title={MultiCoNER v2: a Large Multilingual dataset for Fine-grained and Noisy Named Entity Recognition},
+  author={Fetahu, Besnik and Chen, Zhiyu and Kar, Sudipta and Rokhlenko, Oleg and Malmasi, Shervin},
+  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
+  pages={2027--2051},
+  year={2023}
+}
+@inproceedings{kaushik2026sampurner,
+  title={SampurNER: Fine-grained Named Entity Recognition Dataset for 22 Indian Languages},
+  author={Kaushik, Prachuryya and Anand, Ashish},
+  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
+  volume={40},
+  year={2026}
+}