MuthuS97
/

PIP-BERT

Model card Files Files and versions

PIP-BERT / README.md

MuthuS97's picture

Update README.md

38fca4d verified 24 days ago

|

history blame contribute delete

1.64 kB

	---
	license: creativeml-openrail-m
	base_model:
	- Rostlab/prot_bert
	---
	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)]
	(https://colab.research.google.com/drive/1zccF8lGrF5rNQaSFPTd4wI-xvIJr-A78?usp=sharing)

	ProtBERT-PI enables rapid screening of potential small secreted protease inhibitors in large-scale genomic, transcriptomic, or proteomic datasets.

	The model assigns each input sequence to one of two classes:

	Positive (Potential PI): Predicted to exhibit protease inhibitor activity
	Negative (Non-PI): Predicted to lack protease inhibitor activity

	Output includes:

	Probability of the positive class (prob_class_1): ranges from 0 (low likelihood) to 1 (high likelihood of PI activity)
	Confidence score: probability of the predicted class

	Model Architecture and Training

	ProtBERT-PI is a fine-tuned sequence classification model built on ProtBERT (BertForSequenceClassification):

	Base model: Rostlab/prot_bert
	Pre-trained on large corpora of protein sequences using masked language modeling

	Fine-tuning was performed on a curated dataset of known protease inhibitors and non-protease inhibitor negative set.
	Sequences are tokenized by inserting spaces between amino acids (standard for ProtBERT), enabling effective representation learning.
	Maximum sequence length is configurable (default: 250 AA); longer sequences are truncated.

	Positive examples: known protease inhibitors (<250 AA) from the MEROPS database
	Negative examples: non-inhibitors selected from UniProt using sequence similarity and Pfam domain analysis

	---
	license: creativeml-openrail-m
	---