kouroshSA commited on
Commit
13bb44c
·
verified ·
1 Parent(s): 87f2ab6

README: add HF download snippet, document 2-column CSV input, swap example paths to released checkpoint

Browse files
Files changed (1) hide show
  1. README.md +43 -4
README.md CHANGED
@@ -122,14 +122,53 @@ python train_ppiBTPE3b.py \
122
 
123
  **Important:** When training from scratch, use `--freeze_layers 0` to ensure all layers (including embeddings) remain trainable. The default is 20, which would freeze most layers.
124
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
  ### Inference
126
 
127
  ```bash
128
  python inference_ppiBTPE_2GPU.py \
129
- --model_path out/ppiBTPE_epoch_17.pth \
130
  --model_config facebook/esm1b_t33_650M_UR50S \
131
  --num_layers 12 \
132
- --input_file test_pairs.csv \
133
  --output_file predictions.csv \
134
  --batch_size 4 \
135
  --max_length 1024 \
@@ -139,10 +178,10 @@ python inference_ppiBTPE_2GPU.py \
139
  Multi-GPU inference:
140
  ```bash
141
  python inference_ppiBTPE_2GPU.py \
142
- --model_path out/ppiBTPE_final.pth \
143
  --model_config facebook/esm1b_t33_650M_UR50S \
144
  --num_layers 12 \
145
- --input_file test_pairs.csv \
146
  --output_file predictions.csv \
147
  --device cuda:0,1
148
  ```
 
122
 
123
  **Important:** When training from scratch, use `--freeze_layers 0` to ensure all layers (including embeddings) remain trainable. The default is 20, which would freeze most layers.
124
 
125
+ ### Quick start: fetch the checkpoint from Hugging Face
126
+
127
+ The released MED4 checkpoint (`checkpoints/ppiBTPE_epoch_4.pth`, 12-layer)
128
+ lives on this Hugging Face repo. Pull it without cloning the GitHub mirror:
129
+
130
+ ```python
131
+ from huggingface_hub import hf_hub_download
132
+
133
+ ckpt_path = hf_hub_download(
134
+ repo_id="kouroshSA/ppiBTEP",
135
+ filename="checkpoints/ppiBTPE_epoch_4.pth",
136
+ )
137
+ print(ckpt_path) # pass this string to --model_path
138
+ ```
139
+
140
+ `inference_ppiBTPE_2GPU.py` takes the checkpoint path as a direct
141
+ `--model_path` argument, so no rename or specific directory layout is
142
+ required — point it straight at the file you just downloaded. Use
143
+ `--num_layers 12` to match the architecture this checkpoint was trained
144
+ with.
145
+
146
+ ### Input file format
147
+
148
+ The inference script expects a CSV with two columns of plain amino-acid
149
+ sequences (one protein pair per row — no delimiter tokens, no length
150
+ markers, no chevrons):
151
+
152
+ ```
153
+ seq1,seq2
154
+ MKLR...QSH,MSEDF...VKN
155
+ MQAG...PIA,MTRRL...EEP
156
+ ```
157
+
158
+ A ready-made example is shipped with the repo:
159
+ [`MED4-PPIs-low-confidence_ppiTEPM_prompts.csv`](MED4-PPIs-low-confidence_ppiTEPM_prompts.csv).
160
+ The labeled PRS/RRS reference sets (`MED4_PRS_100.csv`, `MED4_RRS_100.csv`)
161
+ include a third label column, which the inference script ignores — only
162
+ the first two columns are read.
163
+
164
  ### Inference
165
 
166
  ```bash
167
  python inference_ppiBTPE_2GPU.py \
168
+ --model_path checkpoints/ppiBTPE_epoch_4.pth \
169
  --model_config facebook/esm1b_t33_650M_UR50S \
170
  --num_layers 12 \
171
+ --input_file MED4-PPIs-low-confidence_ppiTEPM_prompts.csv \
172
  --output_file predictions.csv \
173
  --batch_size 4 \
174
  --max_length 1024 \
 
178
  Multi-GPU inference:
179
  ```bash
180
  python inference_ppiBTPE_2GPU.py \
181
+ --model_path checkpoints/ppiBTPE_epoch_4.pth \
182
  --model_config facebook/esm1b_t33_650M_UR50S \
183
  --num_layers 12 \
184
+ --input_file MED4-PPIs-low-confidence_ppiTEPM_prompts.csv \
185
  --output_file predictions.csv \
186
  --device cuda:0,1
187
  ```