MuthuS97 commited on
Commit
1555623
·
verified ·
1 Parent(s): d947944

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -3
README.md CHANGED
@@ -1,3 +1,32 @@
1
- ---
2
- license: creativeml-openrail-m
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: creativeml-openrail-m
3
+ base_model:
4
+ - facebook/esm2_t30_150M_UR50D
5
+ ---
6
+ [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)]
7
+ (https://colab.research.google.com/drive/1zccF8lGrF5rNQaSFPTd4wI-xvIJr-A78?usp=sharing)
8
+
9
+ ProtBERT-PI enables rapid screening of potential small secreted protease inhibitors in large-scale genomic, transcriptomic, or proteomic datasets.
10
+
11
+ The model assigns each input sequence to one of two classes:
12
+
13
+ Positive (Potential PI): Predicted to exhibit protease inhibitor activity
14
+ Negative (Non-PI): Predicted to lack protease inhibitor activity
15
+
16
+ Output includes:
17
+
18
+ Probability of the positive class (prob_class_1): ranges from 0 (low likelihood) to 1 (high likelihood of PI activity)
19
+ Confidence score: probability of the predicted class
20
+
21
+ Model Architecture and Training
22
+
23
+ ProtBERT-PI is a fine-tuned sequence classification model built on ProtBERT (BertForSequenceClassification):
24
+
25
+ Base model: Rostlab/prot_bert
26
+ Pre-trained on large corpora of protein sequences using masked language modeling
27
+
28
+ Fine-tuning was performed on a curated dataset of known protease inhibitors and non-protease inhibitor negative set. Sequences are tokenized by inserting spaces between amino acids (standard for ProtBERT), enabling effective representation learning. Maximum sequence length is configurable (default: 250 AA); longer sequences are truncated.
29
+
30
+ ---
31
+ license: creativeml-openrail-m
32
+ ---