File size: 1,340 Bytes
4124475 b00728b e899846 b00728b e899846 b00728b e899846 b00728b e899846 b00728b e899846 b00728b e899846 4124475 b00728b e899846 b00728b e899846 b00728b e899846 b00728b e899846 b00728b e899846 b00728b e899846 b00728b 4124475 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | license: mit
base_model: westlake-repl/SaProt_35M_AF2
Model Description
This model is fine-tuned to predict the mutation effects of *Bacillus subtilis* α-amylase. It takes amino acid sequences as input and performs protein-level regression to predict quantitative enzyme activity values, enabling accurate assessment of how mutations alter enzyme function.
Task type: protein-level regression
Model input type: Amino acid sequence
Dataset
The dataset is sourced from van der Flier et al. (2024), available at https://www.sciencedirect.com/science/article/pii/S2001037024002940. It contains a total of 3706 rows, with each sample carrying 1 to 8 mutation sites. The full dataset is randomly split into training, validation, and test sets following an 8:1:1 ratio. The target label is absorbance, ranging from -0.001 to 0.211. A higher absorbance value indicates greater starch degradation, corresponding to stronger detergent activity of the amylase enzyme.
Performance (on test set)
Spearman correlation: 0.76
LoRA config
r: 8
lora_dropout: 0.1
lora_alpha: 16
target_modules: ["query", "intermediate.dense", "value", "output.dense", "key"]
modules_to_save: ["classifier"]
Training config
optimizer:
class: AdamW
betas: (0.9, 0.98)
weight_decay: 0.01
learning rate: 0.0005
epoch: 30
batch size: 64
precision: 16-mixed |