File size: 2,324 Bytes
9602235
 
 
 
 
9072d90
900ec07
028c7fb
2259236
2c3e0ce
028c7fb
dbacf66
d8166c0
 
 
9602235
d8166c0
9602235
2c3e0ce
 
2dc69bb
2c3e0ce
2dc69bb
2c3e0ce
 
 
9602235
6c34279
9602235
50ef337
9602235
 
 
 
 
 
 
 
 
 
 
 
 
 
878843b
9602235
 
 
 
 
 
 
 
 
 
 
 
e62d421
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: mit
tags:
- biology
---
# Model description
**MHC-I-EpiPred** (MHC-I-EpiPred, MHC I molecular epitope prediction) is a protein language model fine-tuned from [**ESM2**](https://github.com/facebookresearch/esm) pretrained model [(***facebook/esm2_t33_650M_UR50D***)](https://huggingface.co/facebook/esm2_t33_650M_UR50D) on a T cell MHC I epitope dataset.    

**MHC-I-EpiPred** is is a classification model for predicting the class of MHC I epitope.  
   

# Dataset
The original data was downloaded from IEDB data base at https://www.iedb.org/home_v3.php. 
The full data can be downloaded at https://www.iedb.org/downloader.php?file_name=doc/tcell_full_v3.zip  
This dataset comprises 543,717 T-cell epitope entries, spanning a variety of species and infections caused by diverse viruses. The epitope information included encompasses a broad range of potential sources, including data relevant to disease immunotherapy.  

Finally, the dataset we used to train the model contains 41,060 positive and negative samples, which is stored in https://github.com/pengsihua2023/MHC-I-EpiPred/tree/main/data.  

# Results
**MHC-I-EpiPred** achieved the following results:
Training Loss (cross-entropy loss, CEL): 0.1044  
Training Accuracy: 98.99%   
Evaluation Loss (cross-entropy loss, CEL): 0.1576   
Evaluation Accuracy: 97.04%  
Epochs: 492

# Model training code at GitHub
https://github.com/pengsihua2023/MHC-I-EpiPred-ESM2

# How to use **MHC-I-EpiPred**
### An example
Pytorch and transformers libraries should be installed in your system.  
### Install pytorch
```
pip install torch torchvision torchaudio

```
### Install transformers
```
pip install transformers

```
### Run the following code
```
Coming soon!

```

## Funding
This project was funded by the CDC to Justin Bahl (BAA 75D301-21-R-71738).  
### Model architecture, coding and implementation
Sihua Peng  
## Group, Department and Institution  
### Lab: [Justin Bahl](https://bahl-lab.github.io/)  
### Department: [College of Veterinary Medicine Department of Infectious Diseases](https://vet.uga.edu/education/academic-departments/infectious-diseases/)  
### Institution: [The University of Georgia](https://www.uga.edu/)  

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c56e2d2d07296c7e35994f/2rlokZM1FBTxibqrM8ERs.png)