MultiCoNER2_English_XLM / README.md

prachuryyaIITG

Update README.md

9ee3d9b verified 6 days ago

preview code

raw

history blame contribute delete

2.07 kB

metadata

license: mit
datasets:
  - MultiCoNER/multiconer_v2
language:
  - en
metrics:
  - f1
  - precision
  - recall
base_model:
  - FacebookAI/xlm-roberta-large
pipeline_tag: token-classification
tags:
  - NER
  - Named_Entity_Recognition
pretty_name: MultiCoNER2 English XLM-RoBERTa

XLM-RoBERTa is fine-tuned on English MultiCoNER2 dataset for Fine-grained Named Entity Recognition.

The tagset of MultiCoNER2 is a fine-grained tagset. The fine to coarse level mapping of the tags are as follows:

Location (LOC) : Facility, OtherLOC, HumanSettlement, Station
Creative Work (CW) : VisualWork, MusicalWork, WrittenWork, ArtWork, Software
Group (GRP) : MusicalGRP, PublicCORP, PrivateCORP, AerospaceManufacturer, SportsGRP, CarManufacturer, ORG
Person (PER) : Scientist, Artist, Athlete, Politician, Cleric, SportsManager, OtherPER
Product (PROD) : Clothing, Vehicle, Food, Drink, OtherPROD
Medical (MED) : Medication/Vaccine, MedicalProcedure, AnatomicalStructure, Symptom, Disease

Model performance:

Precision: 78.29
Recall: 80.94
F1: 79.59

Training Parameters:

Epochs: 6
Optimizer: AdamW
Learning Rate: 5e-5
Weight Decay: 0.01
Batch Size: 64

Citation

If you use this model, please cite the following papers:

@inproceedings{fetahu2023multiconer,
  title={MultiCoNER v2: a Large Multilingual dataset for Fine-grained and Noisy Named Entity Recognition},
  author={Fetahu, Besnik and Chen, Zhiyu and Kar, Sudipta and Rokhlenko, Oleg and Malmasi, Shervin},
  booktitle={Findings of the Association for Computational Linguistics: EMNLP 2023},
  pages={2027--2051},
  year={2023}
}

@inproceedings{kaushik2026sampurner,
  title={SampurNER: Fine-grained Named Entity Recognition Dataset for 22 Indian Languages},
  author={Kaushik, Prachuryya and Anand, Ashish},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={40},
  year={2026}
}