Add MED4 epoch-8 checkpoint used for low-confidence Y2H screening, with sidecar note
Browse files- checkpoints/ppiDCE_epoch8.md +35 -0
checkpoints/ppiDCE_epoch8.md
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ppiDCE_epoch8.pth
|
| 2 |
+
|
| 3 |
+
**Checkpoint used for screening low-confidence Y2H pairs in the *Prochlorococcus marinus* MED4 interactome.**
|
| 4 |
+
|
| 5 |
+
## Provenance
|
| 6 |
+
|
| 7 |
+
| | |
|
| 8 |
+
|---|---|
|
| 9 |
+
| Model | ppiDCE (dual cross-encoder, ESM-1b-inspired transformer, trained from scratch) |
|
| 10 |
+
| Architecture | 12 transformer layers |
|
| 11 |
+
| Epoch | 8 |
|
| 12 |
+
| File size | ~913 MB |
|
| 13 |
+
| Training run | `out_MED4_12L` |
|
| 14 |
+
| Training set | `train_MED4_ppiBTEPM-pseudo_Int_combo1-2-3.csv` (≈13,008 pairs, pre-clean — see note below) |
|
| 15 |
+
| Validation set | `val_MED4_100_Y2H-RND_ppiBRTPM.csv` |
|
| 16 |
+
|
| 17 |
+
## Intended use
|
| 18 |
+
|
| 19 |
+
Inference / screening of candidate MED4 protein–protein interactions that
|
| 20 |
+
were originally flagged as **low-confidence Y2H hits**. The model is run on
|
| 21 |
+
each candidate pair (sequences encoded jointly as
|
| 22 |
+
`[CLS] Seq_A [SEP] Seq_B [EOS]`) and its softmax probability is used (in
|
| 23 |
+
concert with the other tri-model components, ppiBTEP and ppiGPLM) to retain
|
| 24 |
+
or discard the pair.
|
| 25 |
+
|
| 26 |
+
## Notes
|
| 27 |
+
|
| 28 |
+
- This checkpoint was produced **before** the PRS/RRS de-overlapping pass on
|
| 29 |
+
`train.csv` (see [`MED4-PPIs-low-confidence_ppiTEPM_prompts.csv`](../MED4-PPIs-low-confidence_ppiTEPM_prompts.csv) and the cleaned
|
| 30 |
+
`train.clean.csv` / `train.clean2x.csv` companions). Approximately 608 of
|
| 31 |
+
the 13,008 training rows (4.67 %) overlap with the PRS+RRS evaluation pairs
|
| 32 |
+
in either orientation. Treat metrics on those pairs accordingly.
|
| 33 |
+
- Loading: use `train_ppiDCE.py` / `inference_ppiDCE.py` from the parent repo
|
| 34 |
+
with `--model_config facebook/esm1b_t33_650M_UR50S` (config-only — weights
|
| 35 |
+
are loaded from this checkpoint, not from the HF ESM-1b release).
|