Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,32 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: creativeml-openrail-m
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: creativeml-openrail-m
|
| 3 |
+
base_model:
|
| 4 |
+
- facebook/esm2_t30_150M_UR50D
|
| 5 |
+
---
|
| 6 |
+
[]
|
| 7 |
+
(https://colab.research.google.com/drive/1zccF8lGrF5rNQaSFPTd4wI-xvIJr-A78?usp=sharing)
|
| 8 |
+
|
| 9 |
+
ProtBERT-PI enables rapid screening of potential small secreted protease inhibitors in large-scale genomic, transcriptomic, or proteomic datasets.
|
| 10 |
+
|
| 11 |
+
The model assigns each input sequence to one of two classes:
|
| 12 |
+
|
| 13 |
+
Positive (Potential PI): Predicted to exhibit protease inhibitor activity
|
| 14 |
+
Negative (Non-PI): Predicted to lack protease inhibitor activity
|
| 15 |
+
|
| 16 |
+
Output includes:
|
| 17 |
+
|
| 18 |
+
Probability of the positive class (prob_class_1): ranges from 0 (low likelihood) to 1 (high likelihood of PI activity)
|
| 19 |
+
Confidence score: probability of the predicted class
|
| 20 |
+
|
| 21 |
+
Model Architecture and Training
|
| 22 |
+
|
| 23 |
+
ProtBERT-PI is a fine-tuned sequence classification model built on ProtBERT (BertForSequenceClassification):
|
| 24 |
+
|
| 25 |
+
Base model: Rostlab/prot_bert
|
| 26 |
+
Pre-trained on large corpora of protein sequences using masked language modeling
|
| 27 |
+
|
| 28 |
+
Fine-tuning was performed on a curated dataset of known protease inhibitors and non-protease inhibitor negative set. Sequences are tokenized by inserting spaces between amino acids (standard for ProtBERT), enabling effective representation learning. Maximum sequence length is configurable (default: 250 AA); longer sequences are truncated.
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
+
license: creativeml-openrail-m
|
| 32 |
+
---
|