--- title: README emoji: 🐠 colorFrom: pink colorTo: purple sdk: static pinned: false --- # OTAR3088 NLP-model collection ## Work Package 1 - Knowledge Extraction (NLP) _**Background**_ Within this working group of the greater _OTAR3088, 'Automating Knowledge Management'_ project, we aim to modernise and extend the current named entity recognition workflows of EuropePMC / Open Targets to cover an array of entity types of entities relevant to drug discovery (such as variants, biomarkers, tissues/cell types, adverse events, and assay conditions). These new entities will provide higher confidence in the relevance of a target-disease association. Since NLP models are constantly updated and fine-tuned, we have created a modular, flexible framework that facilitates the creation of new NLP models. _**OTAR3088 HuggingFace**_ This organisation space details all of the data development and model generation of the project. Data is sectioned by the greater entity-type being studied by the group at a given time, sources of data are described in the data cards. Output models are also shared here. _**Learn more about our project, resources and others:**_ * [OTAR3088 - The project](https://home.opentargets.org/OTAR3088) * [Our flexible NLP-model production pipeline](https://github.com/ML4LitS/OTAR3088) * [Published Papers](https://www.tandfonline.com/doi/full/10.1080/17460441.2025.2490835)