File size: 2,365 Bytes
0c611b7
 
 
 
 
349e3d6
0c611b7
ccd1aa4
0c611b7
0d8ce77
28af7c9
0bbe024
28af7c9
0c611b7
28af7c9
ccd1aa4
 
 
e403fa7
 
 
 
 
 
 
ccd1aa4
0c611b7
17f3586
0c611b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96ebfe4
0c611b7
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
license: mit
tags:
- biology
---
# Model description
**MHC-II-EpiPred** (MHC-II-EpiPred, MHC II molecular epitope prediction) is a protein language model fine-tuned from [**ESM2**](https://github.com/facebookresearch/esm) pretrained model [(***facebook/esm2_t33_650M_UR50D***)](https://huggingface.co/facebook/esm2_t33_650M_UR50D) on a T cell MHC II epitope dataset.

**MHC-II-EpiPred** is a classification model for predicting the class of MHC II epitope. 

# Dataset
The original data was downloaded from IEDB data base at https://www.iedb.org/home_v3.php.
The full data can be downloaded at https://www.iedb.org/downloader.php?file_name=doc/tcell_full_v3.zip  
This dataset comprises 543,717 T-cell epitope entries, spanning a variety of species and infections caused by diverse viruses. The epitope information included encompasses a broad range of potential sources, including data relevant to disease immunotherapy.

Finally, the dataset we used to train the model contains 60,256 positive and negative samples, which is stored in https://github.com/pengsihua2023/MHC-II-EpiPred/tree/main/data.  

# Results
**MHC-II-EpiPred** achieved the following results:  
Training Loss (cross-entropy loss, CEL): 0.0355  
Training Accuracy: 0.9916  
Training F1: 0.9916   
Evaluation Loss (cross-entropy loss, CEL): 0.0537   
Evaluation Accuracy: 0.9824   
Evaluation F1: 0.9824  
Epochs: 39     

# Model training code at GitHub
https://github.com/pengsihua2023/MHC-II-EpiPred  

# How to use **MHC-II-EpiPred**
### An example
Pytorch and transformers libraries should be installed in your system.  
### Install pytorch
```
pip install torch torchvision torchaudio

```
### Install transformers
```
pip install transformers

```
### Run the following code
```
Coming soon!

```

## Funding
This project was funded by the CDC to Justin Bahl (BAA 75D301-21-R-71738).  
### Model architecture, coding and implementation
Sihua Peng  
## Group, Department and Institution  
### Lab: [Justin Bahl](https://bahl-lab.github.io/)  
### Department: [College of Veterinary Medicine Department of Infectious Diseases](https://vet.uga.edu/education/academic-departments/infectious-diseases/)  
### Institution: [The University of Georgia](https://www.uga.edu/)  

![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c56e2d2d07296c7e35994f/2rlokZM1FBTxibqrM8ERs.png)