jackkuo commited on
Commit
3c83555
·
verified ·
1 Parent(s): 759913b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +139 -3
README.md CHANGED
@@ -1,3 +1,139 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # BCE-Vir-Prediction
6
+
7
+ A virus epitope prediction tool based on ESM (Evolutionary Scale Modeling). This tool uses a pre-trained ESM classification model to perform sliding window predictions on protein sequences, identifying potential antigen epitopes and functional domains.
8
+
9
+ ## Features
10
+
11
+ - **Epitope Prediction** (`bcepre_predict_logits.py`): Uses a pre-trained ESM classification model to split protein sequences with sliding windows, performs classification predictions on each subsequence (e.g., whether it is an antigen epitope, functional domain, etc.), and saves prediction results along with corresponding logits values.
12
+ - **Amino Acid Probability Prediction** (`bcepre_predict_softmax.py`): Converts sliding window prediction results into probability values aggregated by amino acid position, outputting a results table containing amino acid types, epitope probabilities, and coverage counts.
13
+
14
+ ## Model
15
+
16
+ The pre-trained model can be downloaded from Hugging Face:
17
+
18
+ - **Model Repository**: [jackkuo/BCE-Vir-Prediction_model](https://huggingface.co/jackkuo/BCE-Vir-Prediction_model)
19
+
20
+ - **Code Repository**: [JackKuo666/BCE-Vir-Prediction](https://github.com/JackKuo666/BCE-Vir-Prediction)
21
+
22
+
23
+ # Model Download Instructions
24
+
25
+ This folder is used to store the trained ESM model files.
26
+
27
+ ## How to Download the Model
28
+
29
+ ### Method 1: Using Hugging Face Hub (Recommended)
30
+
31
+ Use the `huggingface_hub` library to download the model:
32
+
33
+ ```bash
34
+ pip install huggingface_hub
35
+ ```
36
+
37
+ Then run the following Python code:
38
+
39
+ ```python
40
+ from huggingface_hub import snapshot_download
41
+
42
+ # Download the model to the current folder
43
+ snapshot_download(
44
+ repo_id="jackkuo/BCE-Vir-Prediction_model",
45
+ local_dir="./",
46
+ local_dir_use_symlinks=False
47
+ )
48
+ ```
49
+
50
+ Or use `huggingface-cli` in the command line:
51
+
52
+ ```bash
53
+ huggingface-cli download jackkuo/BCE-Vir-Prediction_model --local-dir ./ --local-dir-use-symlinks False
54
+ ```
55
+
56
+ ### Method 2: Using Git LFS
57
+
58
+ If Git LFS is installed, you can clone directly:
59
+
60
+ ```bash
61
+ git lfs install
62
+ git clone https://huggingface.co/jackkuo/BCE-Vir-Prediction_model .
63
+ ```
64
+
65
+ ### Method 3: Manual Download
66
+
67
+ Visit the model page: https://huggingface.co/jackkuo/BCE-Vir-Prediction_model
68
+
69
+ Select the required files from the file list to download and save them to this folder.
70
+
71
+ ## Model File Structure
72
+
73
+ After downloading, this folder should contain the following files:
74
+ - `config.json` - Model configuration file
75
+ - `model.safetensors` - Model weights file (in safetensors format)
76
+ - `tokenizer_config.json` - Tokenizer configuration file
77
+ - `vocab.txt` - Vocabulary file
78
+ - `special_tokens_map.json` - Special tokens mapping file
79
+
80
+ ## Usage
81
+
82
+ ### Step 1: Download the Model
83
+
84
+ First, download the pre-trained model to the `trained_esm_model` folder.
85
+
86
+ ### Step 2: Prepare Input Files
87
+
88
+ Place the protein sequence file (FASTA format) to be predicted in the `example_data` folder, or modify the input file path in the script.
89
+
90
+ ### Step 3: Run Epitope Prediction
91
+
92
+ Run the `bcepre_predict_logits.py` script for epitope prediction:
93
+
94
+ ```bash
95
+ python bcepre_predict_logits.py
96
+ ```
97
+
98
+ This script will:
99
+ - Read the protein sequence file in FASTA format
100
+ - Split the sequence using sliding windows (default minimum window size is 5)
101
+ - Perform classification predictions on each subsequence
102
+ - Output a CSV file containing the following fields:
103
+ - `sequence`: Subsequence
104
+ - `window_size`: Window size
105
+ - `prediction`: Predicted class
106
+ - `logit_0`, `logit_1`, ...: Logits values for each class
107
+
108
+ Output files are saved in the `predictions/` folder by default.
109
+
110
+ ### Step 4: Calculate Amino Acid Position Probabilities
111
+
112
+ Run the `bcepre_predict_softmax.py` script to convert prediction results into aggregated probabilities by amino acid position:
113
+
114
+ ```bash
115
+ python bcepre_predict_softmax.py
116
+ ```
117
+
118
+ This script will:
119
+ - Read the CSV file generated by `bcepre_predict_logits.py`
120
+ - Calculate epitope probability for each subsequence (using softmax function)
121
+ - Aggregate probability values by amino acid position
122
+ - Output a CSV file containing the following fields:
123
+ - `position`: Amino acid position (starting from 1)
124
+ - `amino_acid`: Amino acid type
125
+ - `probability`: Epitope probability at this position (average of all window predictions covering this position)
126
+ - `coverage`: Number of windows covering this position
127
+
128
+ ## License
129
+
130
+ This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
131
+
132
+ ## Citation
133
+
134
+ If you use this tool for research, please cite the relevant models and code repositories.
135
+
136
+ ## Contact
137
+
138
+ For questions or suggestions, please contact us through GitHub Issues.
139
+