Update README.md
Browse files
README.md
CHANGED
|
@@ -16,8 +16,13 @@ Validation Loss: 0.0319
|
|
| 16 |
Validation Accuracy: 0.9826
|
| 17 |
Epoch: 20
|
| 18 |
# The dataset for training TarPepSubLoc-ESM2
|
| 19 |
-
The full dataset contains 13,005 protein sequences, including
|
| 20 |
The highly imbalanced sample sizes across the six categories in this dataset pose a significant challenge for classification.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
# How to use
|
| 22 |
|
| 23 |
### An example
|
|
|
|
| 16 |
Validation Accuracy: 0.9826
|
| 17 |
Epoch: 20
|
| 18 |
# The dataset for training TarPepSubLoc-ESM2
|
| 19 |
+
The full dataset contains 13,005 protein sequences, including SP (2,697), MT (499), CH (227), TH (45), and Other (9,537).
|
| 20 |
The highly imbalanced sample sizes across the six categories in this dataset pose a significant challenge for classification.
|
| 21 |
+
- "SP" for signal peptide,
|
| 22 |
+
- "MT" for mitochondrial transit peptide (mTP),
|
| 23 |
+
- "CH" for chloroplast transit peptide (cTP),
|
| 24 |
+
- "TH" for thylakoidal lumen composite transit peptide (lTP),
|
| 25 |
+
- "Other" for no targeting peptide (in this case, the length is given as 0).
|
| 26 |
# How to use
|
| 27 |
|
| 28 |
### An example
|