SampurNER
Collection
SampurNER: Fine-grained Named Entity Recognition for 27 Indian
Languages
•
29 items
•
Updated
MuRIL is fine-tuned on Assamese SampurNER dataset for Fine-grained Named Entity Recognition. It is created using the EaMaTa framework, utilizing the Few-NERD dataset.
Read the paper: SampurNER in AAAI-2026
SampurNER Dataset: datasets/prachuryyaIITG/SampurNER
The tagset of Few-NERD is a fine-grained tagset. The fine to coarse level mapping of the tags are as follows:
Precision: 65.54
Recall: 67.73
F1: 66.26
Epochs: 6
Optimizer: AdamW
Learning Rate: 5e-5
Weight Decay: 0.01
Batch Size: 64
If you use this model, please cite the following papers:
@inproceedings{kaushik2026sampurner,
title={SampurNER: Fine-grained Named Entity Recognition Dataset for 22 Indian Languages},
author={Kaushik, Prachuryya and Anand, Ashish},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={40},
year={2026}
}
@inproceedings{ding-etal-2021-nerd,
title = "Few-{NERD}: A Few-shot Named Entity Recognition Dataset",
author = "Ding, Ning and Xu, Guangwei and Chen, Yulin and Wang, Xiaobin and Han, Xu and Xie, Pengjun and Zheng, Haitao and Liu, Zhiyuan",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.248",
doi = "10.18653/v1/2021.acl-long.248",
pages = "3198--3213",
}
Base model
google/muril-large-cased