| | --- |
| | license: gpl-3.0 |
| | language: |
| | - en |
| | metrics: |
| | - accuracy |
| | base_model: dmis-lab/ANGEL_pretrained |
| | --- |
| | |
| | # Model Card for ANGEL_bc5cdr |
| | This model card provides detailed information about the ANGEL_bc5cdr model, designed for biomedical entity linking. |
| |
|
| |
|
| | # Model Details |
| |
|
| | #### Model Description |
| | - **Developed by:** Chanhwi Kim, Hyunjae Kim, Sihyeon Park, Jiwoo Lee, Mujeen Sung, Jaewoo Kang |
| | - **Model type:** Generative Biomedical Entity Linking Model |
| | - **Language(s):** English |
| | - **License:** GPL-3.0 |
| | - **Finetuned from model:** BART-large (Base architecture) |
| |
|
| | #### Model Sources |
| |
|
| | - **Github Repository:** https://github.com/dmis-lab/ANGEL |
| | - **Paper:** https://arxiv.org/pdf/2408.16493 |
| |
|
| |
|
| | # Direct Use |
| | ANGEL_bc5cdr is a tool specifically designed for biomedical entity linking, with a focus on identifying and linking disease mentions within BC5CDR datasets. |
| | To use this model, you need to set up a virtual environment and the inference code. |
| | Start by cloning our [ANGEL GitHub repository](https://github.com/dmis-lab/ANGEL). |
| | Then, run the following script to set up the environment: |
| | ```bash |
| | bash script/environment/set_environment.sh |
| | ``` |
| | |
| | Then, if you want to run the model on a single sample, no preprocessing is required. |
| | Simply execute the run_sample.sh script: |
| | |
| | ```bash |
| | bash script/inference/run_sample.sh bc5cdr |
| | ``` |
| | |
| | To modify the sample with your own example, refer to the [Direct Use](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#direct-use) section in our GitHub repository. |
| | If you're interested in training or evaluating the model, check out the [Fine-tuning](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#fine-tuning) section and [Evaluation](https://github.com/dmis-lab/ANGEL?tab=readme-ov-file#evaluation) section. |
| | # Training |
| | |
| | #### Training Data |
| | The model was trained on the BC5CDR dataset, which includes annotated disease entities. |
| | |
| | #### Training Procedure |
| | Positive-only Pre-training: Initial training using only positive examples, following the standard approach. |
| | Negative-aware Training: Subsequent training incorporated negative examples to improve the model's discriminative capabilities. |
| | |
| | # Evaluation |
| | |
| | ### Testing Data |
| | The model was evaluated using BC5CDR dataset. |
| | |
| | ### Metrics |
| | Accuracy at Top-1 (Acc@1): Measures the percentage of times the model's top prediction matches the correct entity. |
| | |
| | ### Scores |
| | |
| | <table border="1" cellspacing="0" cellpadding="5" style="width: 100%; text-align: center; border-collapse: collapse; margin-left: 0;"> |
| | <thead> |
| | <tr> |
| | <th><b>Dataset</b></th> |
| | <th><b>BioSYN</b><br>(Sung et al., 2020)</th> |
| | <th><b>SapBERT</b><br>(Liu et al., 2021)</th> |
| | <th><b>GenBioEL</b><br>(Yuan et al., 2022b)</th> |
| | <th><b>ANGEL<br>(Ours)</b></th> |
| | </tr> |
| | </thead> |
| | <tbody> |
| | <tr> |
| | <td><b>BC5CDR</b></td> |
| | <td>-</td> |
| | <td>-</td> |
| | <td>93.1</td> |
| | <td><b>94.5</b></td> |
| | </tr> |
| | </tbody> |
| | </table> |
| | The scores of GenBioEL were reproduced. |
| | |
| | We excluded the performance of BioSYN and SapBERT, as they were evaluated separately on the chemical and disease subsets, differing from our settings. |
| | |
| | # Citation |
| | If you use the ANGEL_bc5cdr model, please cite: |
| |
|
| | ```bibtex |
| | @article{kim2024learning, |
| | title={Learning from Negative Samples in Generative Biomedical Entity Linking}, |
| | author={Kim, Chanhwi and Kim, Hyunjae and Park, Sihyeon and Lee, Jiwoo and Sung, Mujeen and Kang, Jaewoo}, |
| | journal={arXiv preprint arXiv:2408.16493}, |
| | year={2024} |
| | } |
| | ``` |
| |
|
| | # Contact |
| | For questions or issues, please contact chanhwi_kim@korea.ac.kr. |