psresearch/augmented_dataset_llm_generated_NER
Viewer โข Updated โข 3.45k โข 26
psresearch/deberta-v3-large-NER-Scholarly-text is a fine-tuned microsoft/deberta-v3-large model for Named Entity Recognition (NER), specifically tailored to extract software-related entities from scholarly articles.
This model is optimized for extracting mentions of software tools, libraries, citations, versions, URLs, and other related metadata from academic papers and technical documentation, particularly in software engineering domains. wanted to load and run this model check this - submission_recreate.ipynb
| Label | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Abbreviation | 0.6667 | 0.5000 | 0.5714 | 12 |
| AlternativeName | 0.5833 | 0.8235 | 0.6829 | 17 |
| Application | 0.6560 | 0.6198 | 0.6374 | 363 |
| Citation | 0.7245 | 0.7594 | 0.7415 | 187 |
| Developer | 0.3261 | 0.7500 | 0.4545 | 20 |
| Extension | 0.5000 | 0.1667 | 0.2500 | 6 |
| OperatingSystem | 0.5000 | 0.5000 | 0.5000 | 2 |
| PlugIn | 0.2449 | 0.6000 | 0.3478 | 20 |
| ProgrammingEnvironment | 0.8261 | 0.7917 | 0.8085 | 24 |
| Release | 1.0000 | 1.0000 | 1.0000 | 10 |
| SoftwareCoreference | 1.0000 | 1.0000 | 1.0000 | 3 |
| URL | 0.7746 | 0.7857 | 0.7801 | 70 |
| Version | 0.6250 | 0.7292 | 0.6731 | 96 |
| Micro Avg | 0.6438 | 0.6904 | 0.6663 | 830 |
| Macro Avg | 0.6482 | 0.6943 | 0.6498 | 830 |
| Weighted Avg | 0.6675 | 0.6904 | 0.6731 | 830 |
This model was trained on a combination of two annotated datasets focused on software mentions in academic text:
Application, Citation, URL) and may underperform on rarer entities like Extension or OperatingSystem.| Task | Model / Setup | Precision | Recall | F1 |
|---|---|---|---|---|
| NER | DeBERTa-V3-Large | 0.5734 | 0.6612 | 0.5993 |
| NER | DeBERTa-V3-Large (Full Fit + Mistral-7B) | 0.6482 | 0.6943 | 0.6498 |
| NER | DeBERTa-V3-Large (Full Fit + Gemma2-9B) | 0.5875 | 0.6808 | 0.6199 |
| NER | DeBERTa-V3-Large (Full Fit + Qwen2.5) | 0.6657 | 0.6531 | 0.6215 |
| NER | XLM-RoBERTa (Full Fit + Gemma2-9B) | 0.2775 | 0.3104 | 0.2871 |
id2label)
{
"0": "B-Extension", "1": "I-Extension",
"2": "B-Application", "3": "I-Application",
"4": "B-Abbreviation",
"5": "B-Citation", "6": "I-Citation",
"7": "B-SoftwareCoreference", "8": "I-SoftwareCoreference",
"9": "B-URL", "10": "I-URL",
"11": "B-AlternativeName", "12": "I-AlternativeName",
"13": "B-OperatingSystem", "14": "I-OperatingSystem",
"15": "B-Developer", "16": "I-Developer",
"17": "O",
"18": "B-License", "19": "I-License",
"20": "B-PlugIn", "21": "I-PlugIn",
"22": "B-Release", "23": "I-Release",
"24": "B-ProgrammingEnvironment", "25": "I-ProgrammingEnvironment",
"26": "B-Version", "27": "I-Version"
}