sihuapeng commited on
Commit
3ae1f84
·
verified ·
1 Parent(s): 88cebb2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -13,9 +13,11 @@ Average Validation Loss (Mse): 0.0836
13
  Epoch: 3
14
 
15
  # The dataset for training **MHC-II-EpiPred**
16
- The original data we obtained comes from the data in the paper by [Lee CH et al.](https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-023-01225-z) The data is in a CSV file with a total of 9 columns with a sample size of 100,097. We used the first column (amino acid sequences), the second column (immunogenicity, positive or negative), and the ninth column (immunogenicity score). We used these three columns as input to fine-tune the ESM2 pre-trained model and built a regression model. Using this regression model, by inputting potential epitope amino acid sequences, we can predict the immunogenicity score of the potential epitope, and then determine whether it is an epitope based on the set threshold.
 
 
17
 
18
- The dataset was downloaded from GitHub at [**TRAP**](https://github.com/ChloeHJ/TRAP/blob/main/data/pathogenic_db.csv).
19
 
20
  # Model training code at GitHub
21
  https://github.com/pengsihua2023/MHC-II-TCEpiPred
 
13
  Epoch: 3
14
 
15
  # The dataset for training **MHC-II-EpiPred**
16
+ The original data was downloaded from IEDB data base at https://www.iedb.org/home_v3.php.
17
+ The full data can be downloaded at https://www.iedb.org/downloader.php?file_name=doc/tcell_full_v3.zip
18
+ This dataset comprises 543,717 T-cell epitope entries, spanning a variety of species and infections caused by diverse viruses. The epitope information included encompasses a broad range of potential sources, including data relevant to disease immunotherapy.
19
 
20
+ Finally, the dataset we used to train the model contains 60,256 positive and negative samples, which is stored in https://github.com/pengsihua2023/MHC-II-EpiPred/tree/main/data.
21
 
22
  # Model training code at GitHub
23
  https://github.com/pengsihua2023/MHC-II-TCEpiPred