|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- biology |
|
|
--- |
|
|
# Model description |
|
|
**MHC-I-EpiPred** (MHC-I-EpiPred, MHC I molecular epitope prediction) is a protein language model fine-tuned from [**ESM2**](https://github.com/facebookresearch/esm) pretrained model [(***facebook/esm2_t33_650M_UR50D***)](https://huggingface.co/facebook/esm2_t33_650M_UR50D) on a T cell MHC I epitope dataset. |
|
|
|
|
|
**MHC-I-EpiPred** is is a classification model for predicting the class of MHC I epitope. |
|
|
|
|
|
|
|
|
# Dataset |
|
|
The original data was downloaded from IEDB data base at https://www.iedb.org/home_v3.php. |
|
|
The full data can be downloaded at https://www.iedb.org/downloader.php?file_name=doc/tcell_full_v3.zip |
|
|
This dataset comprises 543,717 T-cell epitope entries, spanning a variety of species and infections caused by diverse viruses. The epitope information included encompasses a broad range of potential sources, including data relevant to disease immunotherapy. |
|
|
|
|
|
Finally, the dataset we used to train the model contains 41,060 positive and negative samples, which is stored in https://github.com/pengsihua2023/MHC-I-EpiPred/tree/main/data. |
|
|
|
|
|
# Results |
|
|
**MHC-I-EpiPred** achieved the following results: |
|
|
Training Loss (cross-entropy loss, CEL): 0.1044 |
|
|
Training Accuracy: 98.99% |
|
|
Evaluation Loss (cross-entropy loss, CEL): 0.1576 |
|
|
Evaluation Accuracy: 97.04% |
|
|
Epochs: 492 |
|
|
|
|
|
# Model training code at GitHub |
|
|
https://github.com/pengsihua2023/MHC-I-EpiPred-ESM2 |
|
|
|
|
|
# How to use **MHC-I-EpiPred** |
|
|
### An example |
|
|
Pytorch and transformers libraries should be installed in your system. |
|
|
### Install pytorch |
|
|
``` |
|
|
pip install torch torchvision torchaudio |
|
|
|
|
|
``` |
|
|
### Install transformers |
|
|
``` |
|
|
pip install transformers |
|
|
|
|
|
``` |
|
|
### Run the following code |
|
|
``` |
|
|
Coming soon! |
|
|
|
|
|
``` |
|
|
|
|
|
## Funding |
|
|
This project was funded by the CDC to Justin Bahl (BAA 75D301-21-R-71738). |
|
|
### Model architecture, coding and implementation |
|
|
Sihua Peng |
|
|
## Group, Department and Institution |
|
|
### Lab: [Justin Bahl](https://bahl-lab.github.io/) |
|
|
### Department: [College of Veterinary Medicine Department of Infectious Diseases](https://vet.uga.edu/education/academic-departments/infectious-diseases/) |
|
|
### Institution: [The University of Georgia](https://www.uga.edu/) |
|
|
|
|
|
 |