Update README.md

25237a5 over 4 years ago

1.77 kB

license: mit
tags:
  - generated_from_trainer
datasets:
  - renet
metrics:
  - precision
  - recall
  - f1
  - accuracy
model_index:
  - name: BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-renet
    results:
      - task:
          name: Text Classification
          type: text-classification
        dataset:
          name: renet
          type: renet
        metric:
          name: Accuracy
          type: accuracy
          value: 0.8640646029609691

BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-renet

A model for detecting gene disease associations from abstracts. The model classifies as 0 for no association, or 1 for some association.

This model is a fine-tuned version of microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext on the RENET2 dataset. Note that this considers only the abstract data, and not the full text information, from RENET2.

It achieves the following results on the evaluation set:

Loss: 0.7226
Precision: 0.7799
Recall: 0.8211
F1: 0.8
Accuracy: 0.8641
Auc: 0.9325

Training procedure

The abstract dataset from RENET2 was split into 85% train, 15% evaluation being grouped by PMIDs and stratified by labels. That is, no data from the same PMID was seen in multiple both the training and the evaluation set.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 1
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 5

Framework versions

Transformers 4.9.0.dev0
Pytorch 1.10.0.dev20210630+cu113
Datasets 1.8.0
Tokenizers 0.10.3