|
|
--- |
|
|
license: creativeml-openrail-m |
|
|
base_model: |
|
|
- facebook/esm2_t30_150M_UR50D |
|
|
--- |
|
|
[] |
|
|
(https://colab.research.google.com/drive/1OoX9zDwdSD88UGXxlFctnq_UPcnkkWdp?usp=sharing) |
|
|
|
|
|
|
|
|
**PIPES-M**, a deep learning-based binary classifier designed to predict protease inhibitor (PI) activity from primary protein sequences. |
|
|
|
|
|
|
|
|
PIPES-M is a fine-tuned sequence classification model built on the **ESM-2** protein language model (EsmForSequenceClassification): |
|
|
- Base model: `facebook/esm2_t30_150M_UR50D` (150 million parameters, 30 layers) |
|
|
- Pre-trained on UniRef50 via masked language modeling |
|
|
|
|
|
Fine-tuning was performed on a high-quality curated dataset comprising: |
|
|
- Positive examples: known protease inhibitors (<250 AA) from the MEROPS and Uniprot database |
|
|
- Negative examples: non-inhibitors selected from UniProt using sequence similarity and Pfam domain analysis |
|
|
|
|
|
Training used sequence-only input, requiring no structural data. The classification head leverages evolutionary and physicochemical features encoded by ESM-2. |
|
|
|
|
|
Maximum sequence length is fixed at 250 residues; longer sequences are truncated after 250 AA from the N-terminus, appropriate for the typical size range of small secreted inhibitors. |
|
|
|
|
|
|
|
|
--- |
|
|
license: creativeml-openrail-m |
|
|
--- |