| --- |
| license: mit |
| tags: |
| - generated_from_trainer |
| datasets: |
| - renet |
| metrics: |
| - precision |
| - recall |
| - f1 |
| - accuracy |
| model_index: |
| - name: BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-renet |
| results: |
| - task: |
| name: Text Classification |
| type: text-classification |
| dataset: |
| name: renet |
| type: renet |
| metric: |
| name: Accuracy |
| type: accuracy |
| value: 0.8640646029609691 |
| --- |
| |
| # BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-renet |
|
|
| A model for detecting gene disease associations from abstracts. The model classifies as 0 for no association, or 1 for some association. |
|
|
| This model is a fine-tuned version of [microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext) on the [RENET2](https://github.com/sujunhao/RENET2) dataset. Note that this considers only the abstract data, and not the full text information, from RENET2. |
|
|
| It achieves the following results on the evaluation set: |
| - Loss: 0.7226 |
| - Precision: 0.7799 |
| - Recall: 0.8211 |
| - F1: 0.8 |
| - Accuracy: 0.8641 |
| - Auc: 0.9325 |
|
|
| ## Training procedure |
| The abstract dataset from RENET2 was split into 85% train, 15% evaluation being grouped by PMIDs and stratified by labels. That is, no data from the same PMID was seen in multiple both the training and the evaluation set. |
|
|
| ### Training hyperparameters |
|
|
| The following hyperparameters were used during training: |
| - learning_rate: 5e-05 |
| - train_batch_size: 16 |
| - eval_batch_size: 16 |
| - seed: 1 |
| - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
| - lr_scheduler_type: linear |
| - num_epochs: 5 |
|
|
|
|
| ### Framework versions |
|
|
| - Transformers 4.9.0.dev0 |
| - Pytorch 1.10.0.dev20210630+cu113 |
| - Datasets 1.8.0 |
| - Tokenizers 0.10.3 |
|
|