Update README.md
Browse files
README.md
CHANGED
|
@@ -13,24 +13,22 @@ tags:
|
|
| 13 |
|
| 14 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
Post-Translational Modifications (PTMs) are a fundamental mechanism for regulating cellular functions and
|
| 17 |
increasing the functional diversity of the proteome. Despite the identification of hundreds of unique PTMs
|
| 18 |
through mass-spectrometry (MS) studies, accurately predicting many PTM types based on sequence data alone
|
| 19 |
-
remains a significant challenge.
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
multiple PTM types with a single model. To address this limitation, we present the Contrastively Learned
|
| 23 |
-
Attention-Based Stratified PTM Predictor (CLASPP), a unified PTM prediction model. CLASPP overcomes
|
| 24 |
-
data imbalance challenges by leveraging unsupervised clustering-based under-sampling and incorporating a novel
|
| 25 |
-
contrastive learning framework tailored to PTM data. Drawing inspiration from advancements in image and
|
| 26 |
-
natural language processing, the CLASPP model employs a multi-stage training strategy and utilizes a
|
| 27 |
-
high-quality curated training dataset to improve PTM prediction accuracy compared to existing multi-PTM prediction
|
| 28 |
-
models. Existing PTM prediction models predominantly focus on either single PTM types or employ ensemble methods
|
| 29 |
that combine multiple models to predict different PTM types. This fragmentation is largely driven by the
|
| 30 |
vast imbalance in data availability across PTM types making it difficult to predict multiple PTM types
|
| 31 |
-
with a single model. To address this limitation, we present the
|
| 32 |
-
|
| 33 |
-
|
| 34 |
|
| 35 |
|
| 36 |
<p align="center">
|
|
|
|
| 13 |
|
| 14 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 15 |
|
| 16 |
+
|
| 17 |
+
CLASPP is a ESM2-150m protein lanuguage model that can pred PTM envents occuring on the substrate based
|
| 18 |
+
off primary protein sequence. This is done on multiple differnt PTM types (12) as a form of multi-label
|
| 19 |
+
classifcation. The encoder is training on a supervised Contrastive learing task then the classifcation
|
| 20 |
+
head is finetunted on the multi-label classifcation.
|
| 21 |
+
|
| 22 |
Post-Translational Modifications (PTMs) are a fundamental mechanism for regulating cellular functions and
|
| 23 |
increasing the functional diversity of the proteome. Despite the identification of hundreds of unique PTMs
|
| 24 |
through mass-spectrometry (MS) studies, accurately predicting many PTM types based on sequence data alone
|
| 25 |
+
remains a significant challenge.
|
| 26 |
+
|
| 27 |
+
Existing PTM prediction models predominantly focus on either single PTM types or employ ensemble methods
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
that combine multiple models to predict different PTM types. This fragmentation is largely driven by the
|
| 29 |
vast imbalance in data availability across PTM types making it difficult to predict multiple PTM types
|
| 30 |
+
with a single model. To address this limitation, we present the Contrastively Learned Attention-Based
|
| 31 |
+
Stratified PTM Predictor (CLASPP), a unified PTM prediction model.
|
|
|
|
| 32 |
|
| 33 |
|
| 34 |
<p align="center">
|