ProtBERT-PI enables rapid screening of potential small secreted protease inhibitors in large-scale genomic, transcriptomic, or proteomic datasets.

The model assigns each input sequence to one of two classes:

Positive (Potential PI): Predicted to exhibit protease inhibitor activity
Negative (Non-PI): Predicted to lack protease inhibitor activity

Output includes:

Probability of the positive class (prob_class_1): ranges from 0 (low likelihood) to 1 (high likelihood of PI activity)
Confidence score: probability of the predicted class

Model Architecture and Training

ProtBERT-PI is a fine-tuned sequence classification model built on ProtBERT (BertForSequenceClassification):

Base model: Rostlab/prot_bert
Pre-trained on large corpora of protein sequences using masked language modeling

Fine-tuning was performed on a curated dataset of known protease inhibitors and non-protease inhibitor negative set. Sequences are tokenized by inserting spaces between amino acids (standard for ProtBERT), enabling effective representation learning. Maximum sequence length is configurable (default: 250 AA); longer sequences are truncated.

Positive examples: known protease inhibitors (<250 AA) from the MEROPS database
Negative examples: non-inhibitors selected from UniProt using sequence similarity and Pfam domain analysis

license: creativeml-openrail-m

Downloads last month: 4

Safetensors

Model size

0.4B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MuthuS97/PIP-BERT

Base model

Rostlab/prot_bert

Finetuned

(18)

this model