gravelcompbio commited on
Commit
875361a
·
verified ·
1 Parent(s): 40e9d2b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -70
README.md CHANGED
@@ -9,26 +9,9 @@ tags:
9
  - biology
10
  - protein
11
  ---
12
- # Contrastively Learned Attention based Stratified PTM Predictor (CLASPP) a unified PTM prediction model
13
-
14
- <!-- Provide a quick summary of what the model is/does. -->
15
-
16
-
17
- CLASPP is a ESM2-150m protein lanuguage model that can predict PTM envents occuring on the substrate based
18
- off primary protein sequence. This is done on multiple differnt PTM types (12) as a form of multi-label
19
- classifcation. The encoder is training on a supervised Contrastive learing task then the classifcation
20
- head is finetunted on the multi-label classifcation.
21
 
22
- Post-Translational Modifications (PTMs) are a fundamental mechanism for regulating cellular functions and
23
- increasing the functional diversity of the proteome. Despite the identification of hundreds of unique PTMs
24
- through mass-spectrometry (MS) studies, accurately predicting many PTM types based on sequence data alone
25
- remains a significant challenge.
26
 
27
- Existing PTM prediction models predominantly focus on either single PTM types or employ ensemble methods
28
- that combine multiple models to predict different PTM types. This fragmentation is largely driven by the
29
- vast imbalance in data availability across PTM types making it difficult to predict multiple PTM types
30
- with a single model. To address this limitation, we present the Contrastively Learned Attention-Based
31
- Stratified PTM Predictor (CLASPP), a unified PTM prediction model.
32
 
33
 
34
  <p align="center">
@@ -36,6 +19,17 @@ Stratified PTM Predictor (CLASPP), a unified PTM prediction model.
36
  </p>
37
 
38
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
 
41
 
@@ -66,64 +60,15 @@ Follow along with its recommendation
66
 
67
  Installing torch can be the most complex part
68
 
69
-
70
-
71
-
72
-
73
- ## Model Details
74
-
75
-
76
-
77
- <p align="center">
78
- <img width="100%" src= "figures/Screenshot%20from%202025-07-11%2014-19-21.png">
79
- </p>
80
-
81
- | PTM type | Residue trained on | Number of clusters allocated|output indexes|input indexes (training)|
82
- | -------------------- | ------------- |--------------------------|------------|-------------|
83
- | ST_Phosphorylation | S,T | 5 | 0 or 1 | 0-4 |
84
- | Y_Phosphorylation | Y | 1 | 3 | 25 |
85
- | K_Ubiquitination | K | 20 | 2 | 5-24 |
86
- | K_Acetylation | K | 10 | 4 | 26-35 |
87
- | AM_Acetylation | A,M | 1 | 13 or 14 | 49 |
88
- | N_N-linked-Glycosylation | N | 1 | 5 | 36 |
89
- | ST_O-linked-Glycosylation | S,T | 5 | 6 or 7 | 37-41 |
90
- | RK_Methylation | RK | 4 | 8 or 9 | 42-45 |
91
- | K_Sumoylation | K | 1 | 10 | 46 |
92
- | K_Malonylation | K | 1 | 11 | 53 |
93
- | M_Sulfoxidation | M | 1 | 12 | 48 |
94
- | C_Glutathionylation | C | 1 | 15 | 50 |
95
- | C_S-palmitoylation | C | 1 | 16 | 51 |
96
- | PK_Hydroxylation | P,K | 1 | 17 or 18 | 52 |
97
- |negitve| all res | N/A | 19 | 53|
98
-
99
-
100
-
101
- ### Model Sources [optional]
102
-
103
-
104
-
105
- | Repo | Link | Discription|
106
- | ------------- | ------------- |------------------------------------------|
107
- | GitHub | [github version Data_cur](https://github.com/gravelCompBio/Claspp_data_cur/tree/main) | This verstion contains code but but no data. It needs you to run the code to generate all the helper-files (will take some time run this code)|
108
- | Zenodo | [zenodo version Data_cur](https://github.com/gravelCompBio/Claspp_data_cur/tree/main) | This version contains code and helper files already genrated. mostly for proof of concept and seeing the all the data intermeidate states |
109
- | GitHub | [github version Forward](https://github.com/gravelCompBio/Claspp_data_cur/tree/main) | This verstion contains code but NOT any weights (file too big for github)|
110
- | Huggingface | [huggingface version Forward](https://huggingface.co/gravelcompbio/Claspp) | This verstion contains code and training weights |
111
- | Zenodo | [zenodo version training_data](https://github.com/gravelCompBio/Claspp_data_cur/tree/main) | zenodo version of training/testing/validation data|
112
- | webtool | [website version of webtool](https://github.com/gravelCompBio/Claspp_data_cur/tree/main) | webtool hosted on a server|
113
-
114
- - **Repository:** [More Information Needed]
115
- - **Paper [optional]:** [More Information Needed]
116
- - **Demo [optional]:** [More Information Needed]
117
-
118
-
119
-
120
- ## How to Get Started with the Model
121
 
122
 
123
  ### Downloading this repository
124
 
125
  make sure [git lfs](https://git-lfs.com/) is installed
126
 
 
 
127
  ```
128
  git clone https://huggingface.co/esbglab/Claspp_forward
129
  ```
@@ -189,6 +134,61 @@ Use the code below to get started with the model.
189
 
190
 
191
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
192
  ## Uses
193
 
194
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
@@ -229,6 +229,8 @@ Pattern selection and interpretation:
229
 
230
 
231
 
 
 
232
  - **Developed by:** Major author for most code Nathan Gravel. Finetuning code inspried by Zhongliang Zhou,
233
  - Contrastive learing code inspried by Ruili Fang, Codebase testing and verstion controle by Austin Downes,
234
  - Webtool dev Saber Soleymani
 
9
  - biology
10
  - protein
11
  ---
 
 
 
 
 
 
 
 
 
12
 
13
+ # Contrastively Learned Attention based Stratified PTM Predictor (CLASPP) a unified PTM prediction model
 
 
 
14
 
 
 
 
 
 
15
 
16
 
17
  <p align="center">
 
19
  </p>
20
 
21
 
22
+ CLASPP is a ESM2-150m protein lanuguage model that can pred PTM envents occuring on the substrate based
23
+ off primary protein sequence. This is done on multiple differnt PTM types (12) as a form of multi-label
24
+ classifcation. The encoder is training on a supervised Contrastive learing task then the classifcation
25
+ head is finetunted on the multi-label classifcation. Existing PTM prediction models predominantly focus 
26
+ on either single PTM types or employ ensemble methods that combine multiple models to predict different
27
+ PTM types. This fragmentation is largely driven by the vast imbalance in data availability across PTM
28
+ types making it difficult to predict multiple PTM types with a single model. To address this
29
+ limitation, we present the Contrastively Learned Attention-Based Stratified PTM Predictor (CLASPP),
30
+ a unified PTM prediction model.
31
+
32
+
33
 
34
 
35
 
 
60
 
61
  Installing torch can be the most complex part
62
 
63
+ # How to Get Started with the Model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
 
66
  ### Downloading this repository
67
 
68
  make sure [git lfs](https://git-lfs.com/) is installed
69
 
70
+ Can not store weight files here (too big)
71
+
72
  ```
73
  git clone https://huggingface.co/esbglab/Claspp_forward
74
  ```
 
134
 
135
 
136
 
137
+ ## Model Details
138
+
139
+
140
+
141
+ <p align="center">
142
+ <img width="100%" src= "https://huggingface.co/esbglab/Claspp_forward/blob/main/figures/Screenshot%20from%202025-08-05%2011-49-49.png">
143
+ </p>
144
+
145
+
146
+ | PTM type | Residue trained on | Number of clusters allocated|output indexes|input label indexes (training)|
147
+ | -------------------- | ------------- |--------------------------|------------|-------------|
148
+ | ST_Phosphorylation | S,T | 5 | 0 or 1 | 0-4 |
149
+ | Y_Phosphorylation | Y | 1 | 3 | 25 |
150
+ | K_Ubiquitination | K | 20 | 2 | 5-24 |
151
+ | K_Acetylation | K | 10 | 4 | 26-35 |
152
+ | AM_Acetylation | A,M | 1 | 13 or 14 | 49 |
153
+ | N_N-linked-Glycosylation | N | 1 | 5 | 36 |
154
+ | ST_O-linked-Glycosylation | S,T | 5 | 6 or 7 | 37-41 |
155
+ | RK_Methylation | RK | 4 | 8 or 9 | 42-45 |
156
+ | K_Sumoylation | K | 1 | 10 | 46 |
157
+ | K_Malonylation | K | 1 | 11 | 53 |
158
+ | M_Sulfoxidation | M | 1 | 12 | 48 |
159
+ | C_Glutathionylation | C | 1 | 15 | 50 |
160
+ | C_S-palmitoylation | C | 1 | 16 | 51 |
161
+ | PK_Hydroxylation | P,K | 1 | 17 or 18 | 52 |
162
+ |negitve| all res | N/A | 19 | 53|
163
+
164
+
165
+ ## Data organization and number of clusters
166
+
167
+ <p align="center">
168
+ <img width="100%" src= "https://huggingface.co/esbglab/Claspp_forward/blob/main/figures/Screenshot%20from%202025-08-05%2011-48-48.png">
169
+ </p>
170
+
171
+
172
+ | Repo | Link | Discription|
173
+ | ------------- | ------------- |------------------------------------------|
174
+ | GitHub | [github version Data_cur](https://github.com/gravelCompBio/Claspp_data_cur/tree/main) | This verstion contains code but but no data. It needs you to run the code to generate all the helper-files (will take some time run this code)|
175
+ | Zenodo | [zenodo version Data_cur](https://github.com/gravelCompBio/Claspp_data_cur/tree/main) | This version contains code and helper files already genrated. mostly for proof of concept and seeing the all the data intermeidate states |
176
+ | GitHub | [github version Forward](https://github.com/gravelCompBio/Claspp_data_cur/tree/main) | This verstion contains code but NOT any weights (file too big for github)|
177
+ | Huggingface | [huggingface version Forward](https://huggingface.co/gravelcompbio/Claspp) | This verstion contains code and training weights |
178
+ | Zenodo | [zenodo version training_data](https://github.com/gravelCompBio/Claspp_data_cur/tree/main) | zenodo version of training/testing/validation data|
179
+ | webtool | [website version of webtool](https://github.com/gravelCompBio/Claspp_data_cur/tree/main) | webtool hosted on a server|
180
+
181
+ - **Repository:** [More Information Needed]
182
+ - **Paper [optional]:** [More Information Needed]
183
+ - **Demo [optional]:** [More Information Needed]
184
+
185
+
186
+
187
+
188
+
189
+
190
+
191
+
192
  ## Uses
193
 
194
  <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 
229
 
230
 
231
 
232
+
233
+
234
  - **Developed by:** Major author for most code Nathan Gravel. Finetuning code inspried by Zhongliang Zhou,
235
  - Contrastive learing code inspried by Ruili Fang, Codebase testing and verstion controle by Austin Downes,
236
  - Webtool dev Saber Soleymani