Update README.md
Browse files
README.md
CHANGED
|
@@ -6,6 +6,23 @@ base_model:
|
|
| 6 |
[]
|
| 7 |
(https://colab.research.google.com/github/MuthuS97/PIPES-M/blob/main/PIPES-M.ipynb)
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
license: creativeml-openrail-m
|
| 11 |
---
|
|
|
|
| 6 |
[]
|
| 7 |
(https://colab.research.google.com/github/MuthuS97/PIPES-M/blob/main/PIPES-M.ipynb)
|
| 8 |
|
| 9 |
+
|
| 10 |
+
**PIPES-M**, a deep learning-based binary classifier designed to predict protease inhibitor (PI) activity from primary protein sequences.
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
PIPES-M is a fine-tuned sequence classification model built on the **ESM-2** protein language model:
|
| 14 |
+
- Base model: `facebook/esm2_t30_150M_UR50D` (150 million parameters, 30 layers)
|
| 15 |
+
- Pre-trained on UniRef50 via masked language modeling
|
| 16 |
+
|
| 17 |
+
Fine-tuning was performed on a high-quality curated dataset comprising:
|
| 18 |
+
- Positive examples: known protease inhibitors (<250 AA) from the MEROPS database
|
| 19 |
+
- Negative examples: non-inhibitors selected from UniProt using sequence similarity and Pfam domain analysis
|
| 20 |
+
|
| 21 |
+
Training used sequence-only input, requiring no structural data. The classification head leverages evolutionary and physicochemical features encoded by ESM-2.
|
| 22 |
+
|
| 23 |
+
Maximum sequence length is fixed at 250 residues; longer sequences are truncated from the N-terminus, appropriate for the typical size range of small secreted inhibitors.
|
| 24 |
+
|
| 25 |
+
|
| 26 |
---
|
| 27 |
license: creativeml-openrail-m
|
| 28 |
---
|