File size: 1,332 Bytes
5d4536a
 
 
 
 
 
899213e
5d4536a
b406515
 
 
 
87f9007
b406515
 
 
 
9128c09
b406515
 
 
 
edfdea3
b406515
 
5d4536a
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
---
license: creativeml-openrail-m
base_model:
- facebook/esm2_t30_150M_UR50D
---
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)]
(https://colab.research.google.com/drive/1OoX9zDwdSD88UGXxlFctnq_UPcnkkWdp?usp=sharing)


**PIPES-M**, a deep learning-based binary classifier designed to predict protease inhibitor (PI) activity from primary protein sequences.


PIPES-M is a fine-tuned sequence classification model built on the **ESM-2** protein language model (EsmForSequenceClassification):  
- Base model: `facebook/esm2_t30_150M_UR50D` (150 million parameters, 30 layers)  
- Pre-trained on UniRef50 via masked language modeling  

Fine-tuning was performed on a high-quality curated dataset comprising:  
- Positive examples: known protease inhibitors (<250 AA) from the MEROPS and Uniprot database  
- Negative examples: non-inhibitors selected from UniProt using sequence similarity and Pfam domain analysis  

Training used sequence-only input, requiring no structural data. The classification head leverages evolutionary and physicochemical features encoded by ESM-2.  

Maximum sequence length is fixed at 250 residues; longer sequences are truncated after 250 AA from the N-terminus, appropriate for the typical size range of small secreted inhibitors.


---
license: creativeml-openrail-m
---