|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- as |
|
|
base_model: |
|
|
- google/muril-large-cased |
|
|
pipeline_tag: token-classification |
|
|
tags: |
|
|
- NER |
|
|
- Named_Entity_Recognition |
|
|
pretty_name: CLASSER Assamese MuRIL |
|
|
datasets: |
|
|
- prachuryyaIITG/CLASSER |
|
|
metrics: |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
--- |
|
|
|
|
|
**MuRIL is fine-tuned on Assamese [CLASSER](https://huggingface.co/datasets/prachuryyaIITG/CLASSER) dataset for Fine-grained Named Entity Recognition.** |
|
|
|
|
|
The tagset of [MultiCoNER2](https://huggingface.co/datasets/MultiCoNER/multiconer_v2) is a fine-grained tagset. The fine to coarse level mapping of the tags are as follows: |
|
|
|
|
|
* Location (LOC) : Facility, OtherLOC, HumanSettlement, Station |
|
|
* Creative Work (CW) : VisualWork, MusicalWork, WrittenWork, ArtWork, Software |
|
|
* Group (GRP) : MusicalGRP, PublicCORP, PrivateCORP, AerospaceManufacturer, SportsGRP, CarManufacturer, ORG |
|
|
* Person (PER) : Scientist, Artist, Athlete, Politician, Cleric, SportsManager, OtherPER |
|
|
* Product (PROD) : Clothing, Vehicle, Food, Drink, OtherPROD |
|
|
* Medical (MED) : Medication/Vaccine, MedicalProcedure, AnatomicalStructure, Symptom, Disease |
|
|
|
|
|
## Model performance: |
|
|
Precision: 74.88 <br> |
|
|
Recall: 75.62 <br> |
|
|
**F1: 75.25** <br> |
|
|
|
|
|
## Training Parameters: |
|
|
Epochs: 6 <br> |
|
|
Optimizer: AdamW <br> |
|
|
Learning Rate: 5e-5 <br> |
|
|
Weight Decay: 0.01 <br> |
|
|
Batch Size: 64 <br> |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the following papers: |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{kaushik2025classer, |
|
|
title = {{CLASSER}: Cross-lingual Annotation Projection enhancement through Script Similarity for Fine-grained Named Entity Recognition}, |
|
|
author = {Kaushik, Prachuryya and Anand, Ashish}, |
|
|
booktitle = {Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics}, |
|
|
year = {2025}, |
|
|
publisher = {Association for Computational Linguistics}, |
|
|
note = {Main conference paper} |
|
|
} |
|
|
|
|
|
@inproceedings{kaushik2026sampurner, |
|
|
title={SampurNER: Fine-grained Named Entity Recognition Dataset for 22 Indian Languages}, |
|
|
author={Kaushik, Prachuryya and Anand, Ashish}, |
|
|
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence}, |
|
|
volume={40}, |
|
|
year={2026} |
|
|
} |
|
|
|
|
|
@inproceedings{fetahu2023multiconer, |
|
|
title={MultiCoNER v2: a Large Multilingual dataset for Fine-grained and Noisy Named Entity Recognition}, |
|
|
author={Fetahu, Besnik and Chen, Zhiyu and Kar, Sudipta and Rokhlenko, Oleg and Malmasi, Shervin}, |
|
|
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023}, |
|
|
pages={2027--2051}, |
|
|
year={2023} |
|
|
} |