| | --- |
| | base_model: westlake-repl/SaProt_35M_AF2 |
| | library_name: peft |
| | --- |
| | # Base model: [westlake-repl/SaProt_35M_AF2](https://huggingface.co/westlake-repl/SaProt_35M_AF2) |
| |
|
| | # Model Card for Model ID |
| |
|
| | <!-- Provide a quick summary of what the model is/does. --> |
| | This model is trained on a sigle site deep mutation scanning dataset and |
| | can be used to predict fitness score of mutant amino acid sequence of protein [UBC9_HUMAN](https://www.uniprot.org/uniprotkb/P63279/entry) (SUMO-conjugating enzyme UBC9). |
| |
|
| | ## Protein Function |
| | This proterin can accepts the ubiquitin-like proteins SUMO1, SUMO2, SUMO3, SUMO4 and SUMO1P1/SUMO5 from the UBLE1A-UBLE1B E1 complex and |
| | catalyzes their covalent attachment to other proteins with the help of an E3 ligase such as RANBP2, CBX4 and ZNF451. |
| |
|
| | ### Task type |
| | protein level regression |
| |
|
| | ### Dataset description |
| | The dataset is from [Deep generative models of genetic variation capture the effects of mutations](https://www.nature.com/articles/s41592-018-0138-4). |
| | And can also be found on [SaprotHub dataset](https://huggingface.co/datasets/SaProtHub/DMS_UBC9_HUMAN). |
| |
|
| | Label means fitness score of each mutant amino acid sequence. |
| | The wild‐type mutants receiving a score of one, larger value represents higher fitness. |
| |
|
| | ### Model input type |
| | Amino acid sequence |
| |
|
| | ### Performance |
| | 0.60 Spearman's ρ |
| |
|
| | ### LoRA config |
| | lora_dropout: 0.0 |
| | |
| | lora_alpha: 16 |
| |
|
| | target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"] |
| | |
| | modules_to_save: ["classifier"] |
| | |
| | ### Training config |
| | class: AdamW |
| | |
| | betas: (0.9, 0.98) |
| | |
| | weight_decay: 0.01 |
| |
|
| | learning rate: 1e-4 |
| |
|
| | epoch: 100 |
| |
|
| | batch size: 2 |
| |
|
| | precision: 16-mixed |