MuthuS97
/

PIP-BERT

MuthuS97 commited on Jan 7

Commit

1555623

verified ·

1 Parent(s): d947944

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,3 +1,32 @@
----
-license: creativeml-openrail-m
----

+---
+license: creativeml-openrail-m
+base_model:
+- facebook/esm2_t30_150M_UR50D
+---
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)]
+(https://colab.research.google.com/drive/1zccF8lGrF5rNQaSFPTd4wI-xvIJr-A78?usp=sharing)
+ProtBERT-PI enables rapid screening of potential small secreted protease inhibitors in large-scale genomic, transcriptomic, or proteomic datasets.
+The model assigns each input sequence to one of two classes:
+    Positive (Potential PI): Predicted to exhibit protease inhibitor activity
+    Negative (Non-PI): Predicted to lack protease inhibitor activity
+Output includes:
+    Probability of the positive class (prob_class_1): ranges from 0 (low likelihood) to 1 (high likelihood of PI activity)
+    Confidence score: probability of the predicted class
+Model Architecture and Training
+ProtBERT-PI is a fine-tuned sequence classification model built on ProtBERT (BertForSequenceClassification):
+    Base model: Rostlab/prot_bert
+    Pre-trained on large corpora of protein sequences using masked language modeling
+Fine-tuning was performed on a curated dataset of known protease inhibitors and non-protease inhibitor negative set. Sequences are tokenized by inserting spaces between amino acids (standard for ProtBERT), enabling effective representation learning. Maximum sequence length is configurable (default: 250 AA); longer sequences are truncated.
+---
+license: creativeml-openrail-m
+---