Update README.md
Browse files
README.md
CHANGED
|
@@ -25,7 +25,12 @@ ProtBERT-PI is a fine-tuned sequence classification model built on ProtBERT (Ber
|
|
| 25 |
Base model: Rostlab/prot_bert
|
| 26 |
Pre-trained on large corpora of protein sequences using masked language modeling
|
| 27 |
|
| 28 |
-
Fine-tuning was performed on a curated dataset of known protease inhibitors and non-protease inhibitor negative set.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
---
|
| 31 |
license: creativeml-openrail-m
|
|
|
|
| 25 |
Base model: Rostlab/prot_bert
|
| 26 |
Pre-trained on large corpora of protein sequences using masked language modeling
|
| 27 |
|
| 28 |
+
Fine-tuning was performed on a curated dataset of known protease inhibitors and non-protease inhibitor negative set.
|
| 29 |
+
Sequences are tokenized by inserting spaces between amino acids (standard for ProtBERT), enabling effective representation learning.
|
| 30 |
+
Maximum sequence length is configurable (default: 250 AA); longer sequences are truncated.
|
| 31 |
+
|
| 32 |
+
Positive examples: known protease inhibitors (<250 AA) from the MEROPS database
|
| 33 |
+
Negative examples: non-inhibitors selected from UniProt using sequence similarity and Pfam domain analysis
|
| 34 |
|
| 35 |
---
|
| 36 |
license: creativeml-openrail-m
|