[]
(https://colab.research.google.com/drive/1OoX9zDwdSD88UGXxlFctnq_UPcnkkWdp?usp=sharing)
PIPES-M, a deep learning-based binary classifier designed to predict protease inhibitor (PI) activity from primary protein sequences.
PIPES-M is a fine-tuned sequence classification model built on the ESM-2 protein language model (EsmForSequenceClassification):
- Base model:
facebook/esm2_t30_150M_UR50D(150 million parameters, 30 layers) - Pre-trained on UniRef50 via masked language modeling
Fine-tuning was performed on a high-quality curated dataset comprising:
- Positive examples: known protease inhibitors (<250 AA) from the MEROPS and Uniprot database
- Negative examples: non-inhibitors selected from UniProt using sequence similarity and Pfam domain analysis
Training used sequence-only input, requiring no structural data. The classification head leverages evolutionary and physicochemical features encoded by ESM-2.
Maximum sequence length is fixed at 250 residues; longer sequences are truncated from the N-terminus, appropriate for the typical size range of small secreted inhibitors.
license: creativeml-openrail-m
- Downloads last month
- 162
Model tree for MuthuS97/PIPES-M
Base model
facebook/esm2_t30_150M_UR50D