Marijke
/

electra_hypopt_NER

@@ -17,7 +17,7 @@ This model is part of a series of models trained for the ML4AL paper “Gotta ca
 - **Language(s) (NLP):** Ancient Greek (GLAUx normalization)
 - **Finetuned from model:** mercelisw/electra-grc
-### Model Sources [optional]
 - **Repository:** [https://github.com/NER-AncientLanguages/NERAncientGreekML4AL] (for data and training scripts)
 - **Paper:** [https://aclanthology.org/2024.ml4al-1.16/]
@@ -34,7 +34,7 @@ This model is part of a series of models trained for the ML4AL paper “Gotta ca
 This models was evaluation on precision, recall and macro-f1 for its entity classes. See the paper for more information.
-|              |   precision |   recall |   f1-score |   support (# of entities of this type in the test set) |
 |:-------------|------------:|---------:|-----------:|----------:|
 | GRP          |    0.778515 | 0.848266 |   0.811895 |      1384 |
 | LOC          |    0.708829 | 0.755656 |   0.731494 |      1105 |
@@ -45,10 +45,25 @@ This models was evaluation on precision, recall and macro-f1 for its entity clas
-## Citation [optional]
-**BibTeX:**
-@inproceedings{Beersmans_Keersmaekers_de Graaf_Van de Cruys_Depauw_Fantoli_2024, address={Hybrid in Bangkok, Thailand and online}, title={“Gotta catch `em all!”: Retrieving people in Ancient Greek texts combining transformer models and domain knowledge}, url={https://aclanthology.org/2024.ml4al-1.16}, DOI={10.18653/v1/2024.ml4al-1.16}, abstractNote={In this paper, we present a study of transformer-based Named Entity Recognition (NER) as applied to Ancient Greek texts, with an emphasis on retrieving personal names. Recent research shows that, while the task remains difficult, the use of transformer models results in significant improvements. We, therefore, compare the performance of four transformer models on the task of NER for the categories of people, locations and groups, and add an out-of-domain test set to the existing datasets. Results on this set highlight the shortcomings of the models when confronted with a random sample of sentences. To be able to more straightforwardly integrate domain and linguistic knowledge to improve performance, we narrow down our approach to the category of people. The task is simplified to a binary PERS/MISC classification on the token level, starting from capitalised words. Next, we test the use of domain and linguistic knowledge to improve the results. We find that including simple gazetteer information as a binary mask has a marginally positive effect on newly annotated data and that treebanks can be used to help identify multi-word individuals if they are scarcely or inconsistently annotated in the available training data. The qualitative error analysis identifies the potential for improvement in both manual annotation and the inclusion of domain and linguistic knowledge in the transformer models.}, booktitle={Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)}, publisher={Association for Computational Linguistics}, author={Beersmans, Marijke and Keersmaekers, Alek and de Graaf, Evelien and Van de Cruys, Tim and Depauw, Mark and Fantoli, Margherita}, editor={Pavlopoulos, John and Sommerschield, Thea and Assael, Yannis and Gordin, Shai and Cho, Kyunghyun and Passarotti, Marco and Sprugnoli, Rachele and Liu, Yudong and Li, Bin and Anderson, Adam}, year={2024}, month=aug, pages={152–164} }
 **APA:**

 - **Language(s) (NLP):** Ancient Greek (GLAUx normalization)
 - **Finetuned from model:** mercelisw/electra-grc
+### Model Sources
 - **Repository:** [https://github.com/NER-AncientLanguages/NERAncientGreekML4AL] (for data and training scripts)
 - **Paper:** [https://aclanthology.org/2024.ml4al-1.16/]
 This models was evaluation on precision, recall and macro-f1 for its entity classes. See the paper for more information.
+|              |   precision |   recall |   f1-score |   support |
 |:-------------|------------:|---------:|-----------:|----------:|
 | GRP          |    0.778515 | 0.848266 |   0.811895 |      1384 |
 | LOC          |    0.708829 | 0.755656 |   0.731494 |      1105 |
+If you use this work, please cite the following paper:
+### **BibTeX**
+```bibtex
+@inproceedings{Beersmans_Keersmaekers_de Graaf_Van de Cruys_Depauw_Fantoli_2024,
+  address = {Hybrid in Bangkok, Thailand and online},
+  title = {“Gotta catch `em all!”: Retrieving people in Ancient Greek texts combining transformer models and domain knowledge},
+  url = {https://aclanthology.org/2024.ml4al-1.16},
+  DOI = {10.18653/v1/2024.ml4al-1.16},
+  abstractNote = {In this paper, we present a study of transformer-based Named Entity Recognition (NER) as applied to Ancient Greek texts, with an emphasis on retrieving personal names. Recent research shows that, while the task remains difficult, the use of transformer models results in significant improvements. We, therefore, compare the performance of four transformer models on the task of NER for the categories of people, locations and groups, and add an out-of-domain test set to the existing datasets. Results on this set highlight the shortcomings of the models when confronted with a random sample of sentences. To be able to more straightforwardly integrate domain and linguistic knowledge to improve performance, we narrow down our approach to the category of people. The task is simplified to a binary PERS/MISC classification on the token level, starting from capitalised words. Next, we test the use of domain and linguistic knowledge to improve the results. We find that including simple gazetteer information as a binary mask has a marginally positive effect on newly annotated data and that treebanks can be used to help identify multi-word individuals if they are scarcely or inconsistently annotated in the available training data. The qualitative error analysis identifies the potential for improvement in both manual annotation and the inclusion of domain and linguistic knowledge in the transformer models.},
+  booktitle = {Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)},
+  publisher = {Association for Computational Linguistics},
+  author = {Beersmans, Marijke and Keersmaekers, Alek and de Graaf, Evelien and Van de Cruys, Tim and Depauw, Mark and Fantoli, Margherita},
+  editor = {Pavlopoulos, John and Sommerschield, Thea and Assael, Yannis and Gordin, Shai and Cho, Kyunghyun and Passarotti, Marco and Sprugnoli, Rachele and Liu, Yudong and Li, Bin and Anderson, Adam},
+  year = {2024},
+  month = aug,
+  pages = {152--164}
+}
 **APA:**