Update README.md
Browse files
README.md
CHANGED
|
@@ -1,39 +1,49 @@
|
|
|
|
|
| 1 |
|
| 2 |
-
|
| 3 |
|
| 4 |
-
|
| 5 |
|
| 6 |
-
|
| 7 |
|
| 8 |
-
|
| 9 |
|
|
|
|
| 10 |
|
|
|
|
| 11 |
|
| 12 |
-
|
| 13 |
-
<slot name='description'>
|
| 14 |
|
| 15 |
-
|
| 16 |
-
Protein-level Regression
|
| 17 |
|
| 18 |
-
|
| 19 |
-
AA Sequence
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
-
|
| 24 |
-
- **lora_dropout:** 0.1
|
| 25 |
-
- **lora_alpha:** 16
|
| 26 |
-
- **target_modules:** ['query', 'intermediate.dense', 'value', 'output.dense', 'key']
|
| 27 |
-
- **modules_to_save:** ['classifier']
|
| 28 |
|
| 29 |
-
|
| 30 |
|
| 31 |
-
|
| 32 |
-
- **class:** AdamW
|
| 33 |
-
- **betas:** (0.9, 0.98)
|
| 34 |
-
- **weight_decay:** 0.01
|
| 35 |
-
- **learning rate:** 0.0005
|
| 36 |
-
- **epoch:** 30
|
| 37 |
-
- **batch size:** 64
|
| 38 |
-
- **precision:** 16-mixed
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
base_model: westlake-repl/SaProt_35M_AF2
|
| 2 |
|
| 3 |
+
Model Description
|
| 4 |
|
| 5 |
+
This model is fine-tuned to predict the mutation effects of *Bacillus subtilis* α-amylase. It takes amino acid sequences as input and performs protein-level regression to predict quantitative enzyme activity values, enabling accurate assessment of how mutations alter enzyme function.
|
| 6 |
|
| 7 |
+
Task type: protein-level regression
|
| 8 |
|
| 9 |
+
Model input type: Amino acid sequence
|
| 10 |
|
| 11 |
+
Dataset
|
| 12 |
|
| 13 |
+
The dataset is sourced from van der Flier et al. (2024), available at https://www.sciencedirect.com/science/article/pii/S2001037024002940. It contains a total of 3706 rows, with each sample carrying 1 to 8 mutation sites. The full dataset is randomly split into training, validation, and test sets following an 8:1:1 ratio. The target label is absorbance. A higher absorbance value indicates greater starch degradation range from(), corresponding to stronger detergent activity of the amylase enzyme.
|
| 14 |
|
| 15 |
+
Performance (on test set)
|
|
|
|
| 16 |
|
| 17 |
+
Spearman correlation: 0.76
|
|
|
|
| 18 |
|
| 19 |
+
Coefficient of determination (R²): 0.69
|
|
|
|
| 20 |
|
| 21 |
+
LoRA config
|
| 22 |
|
| 23 |
+
r: 8
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
+
lora_dropout: 0.1
|
| 26 |
|
| 27 |
+
lora_alpha: 16
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
+
target_modules: ["query", "intermediate.dense", "value", "output.dense", "key"]
|
| 30 |
+
|
| 31 |
+
modules_to_save: ["classifier"]
|
| 32 |
+
|
| 33 |
+
Training config
|
| 34 |
+
|
| 35 |
+
optimizer:
|
| 36 |
+
|
| 37 |
+
class: AdamW
|
| 38 |
+
|
| 39 |
+
betas: (0.9, 0.98)
|
| 40 |
+
|
| 41 |
+
weight_decay: 0.01
|
| 42 |
+
|
| 43 |
+
learning rate: 0.0005
|
| 44 |
+
|
| 45 |
+
epoch: 30
|
| 46 |
+
|
| 47 |
+
batch size: 64
|
| 48 |
+
|
| 49 |
+
precision: 16-mixed
|