README: add HF download snippet, document 2-column CSV input, swap example paths to released checkpoint
Browse files
README.md
CHANGED
|
@@ -122,14 +122,53 @@ python train_ppiBTPE3b.py \
|
|
| 122 |
|
| 123 |
**Important:** When training from scratch, use `--freeze_layers 0` to ensure all layers (including embeddings) remain trainable. The default is 20, which would freeze most layers.
|
| 124 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
### Inference
|
| 126 |
|
| 127 |
```bash
|
| 128 |
python inference_ppiBTPE_2GPU.py \
|
| 129 |
-
--model_path
|
| 130 |
--model_config facebook/esm1b_t33_650M_UR50S \
|
| 131 |
--num_layers 12 \
|
| 132 |
-
--input_file
|
| 133 |
--output_file predictions.csv \
|
| 134 |
--batch_size 4 \
|
| 135 |
--max_length 1024 \
|
|
@@ -139,10 +178,10 @@ python inference_ppiBTPE_2GPU.py \
|
|
| 139 |
Multi-GPU inference:
|
| 140 |
```bash
|
| 141 |
python inference_ppiBTPE_2GPU.py \
|
| 142 |
-
--model_path
|
| 143 |
--model_config facebook/esm1b_t33_650M_UR50S \
|
| 144 |
--num_layers 12 \
|
| 145 |
-
--input_file
|
| 146 |
--output_file predictions.csv \
|
| 147 |
--device cuda:0,1
|
| 148 |
```
|
|
|
|
| 122 |
|
| 123 |
**Important:** When training from scratch, use `--freeze_layers 0` to ensure all layers (including embeddings) remain trainable. The default is 20, which would freeze most layers.
|
| 124 |
|
| 125 |
+
### Quick start: fetch the checkpoint from Hugging Face
|
| 126 |
+
|
| 127 |
+
The released MED4 checkpoint (`checkpoints/ppiBTPE_epoch_4.pth`, 12-layer)
|
| 128 |
+
lives on this Hugging Face repo. Pull it without cloning the GitHub mirror:
|
| 129 |
+
|
| 130 |
+
```python
|
| 131 |
+
from huggingface_hub import hf_hub_download
|
| 132 |
+
|
| 133 |
+
ckpt_path = hf_hub_download(
|
| 134 |
+
repo_id="kouroshSA/ppiBTEP",
|
| 135 |
+
filename="checkpoints/ppiBTPE_epoch_4.pth",
|
| 136 |
+
)
|
| 137 |
+
print(ckpt_path) # pass this string to --model_path
|
| 138 |
+
```
|
| 139 |
+
|
| 140 |
+
`inference_ppiBTPE_2GPU.py` takes the checkpoint path as a direct
|
| 141 |
+
`--model_path` argument, so no rename or specific directory layout is
|
| 142 |
+
required — point it straight at the file you just downloaded. Use
|
| 143 |
+
`--num_layers 12` to match the architecture this checkpoint was trained
|
| 144 |
+
with.
|
| 145 |
+
|
| 146 |
+
### Input file format
|
| 147 |
+
|
| 148 |
+
The inference script expects a CSV with two columns of plain amino-acid
|
| 149 |
+
sequences (one protein pair per row — no delimiter tokens, no length
|
| 150 |
+
markers, no chevrons):
|
| 151 |
+
|
| 152 |
+
```
|
| 153 |
+
seq1,seq2
|
| 154 |
+
MKLR...QSH,MSEDF...VKN
|
| 155 |
+
MQAG...PIA,MTRRL...EEP
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
A ready-made example is shipped with the repo:
|
| 159 |
+
[`MED4-PPIs-low-confidence_ppiTEPM_prompts.csv`](MED4-PPIs-low-confidence_ppiTEPM_prompts.csv).
|
| 160 |
+
The labeled PRS/RRS reference sets (`MED4_PRS_100.csv`, `MED4_RRS_100.csv`)
|
| 161 |
+
include a third label column, which the inference script ignores — only
|
| 162 |
+
the first two columns are read.
|
| 163 |
+
|
| 164 |
### Inference
|
| 165 |
|
| 166 |
```bash
|
| 167 |
python inference_ppiBTPE_2GPU.py \
|
| 168 |
+
--model_path checkpoints/ppiBTPE_epoch_4.pth \
|
| 169 |
--model_config facebook/esm1b_t33_650M_UR50S \
|
| 170 |
--num_layers 12 \
|
| 171 |
+
--input_file MED4-PPIs-low-confidence_ppiTEPM_prompts.csv \
|
| 172 |
--output_file predictions.csv \
|
| 173 |
--batch_size 4 \
|
| 174 |
--max_length 1024 \
|
|
|
|
| 178 |
Multi-GPU inference:
|
| 179 |
```bash
|
| 180 |
python inference_ppiBTPE_2GPU.py \
|
| 181 |
+
--model_path checkpoints/ppiBTPE_epoch_4.pth \
|
| 182 |
--model_config facebook/esm1b_t33_650M_UR50S \
|
| 183 |
--num_layers 12 \
|
| 184 |
+
--input_file MED4-PPIs-low-confidence_ppiTEPM_prompts.csv \
|
| 185 |
--output_file predictions.csv \
|
| 186 |
--device cuda:0,1
|
| 187 |
```
|