SaProtHub
/

Model-DMS_UBC9_HUMAN-35M

Model card Files Files and versions

Model-DMS_UBC9_HUMAN-35M / README.md

FarmerTao's picture

Update README.md

b1ffc7e verified over 1 year ago

|

history blame contribute delete

1.63 kB

	---
	base_model: westlake-repl/SaProt_35M_AF2
	library_name: peft
	---
	# Base model: [westlake-repl/SaProt_35M_AF2](https://huggingface.co/westlake-repl/SaProt_35M_AF2)

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->
	This model is trained on a sigle site deep mutation scanning dataset and
	can be used to predict fitness score of mutant amino acid sequence of protein [UBC9_HUMAN](https://www.uniprot.org/uniprotkb/P63279/entry) (SUMO-conjugating enzyme UBC9).

	## Protein Function
	This proterin can accepts the ubiquitin-like proteins SUMO1, SUMO2, SUMO3, SUMO4 and SUMO1P1/SUMO5 from the UBLE1A-UBLE1B E1 complex and
	catalyzes their covalent attachment to other proteins with the help of an E3 ligase such as RANBP2, CBX4 and ZNF451.

	### Task type
	protein level regression

	### Dataset description
	The dataset is from [Deep generative models of genetic variation capture the effects of mutations](https://www.nature.com/articles/s41592-018-0138-4).
	And can also be found on [SaprotHub dataset](https://huggingface.co/datasets/SaProtHub/DMS_UBC9_HUMAN).

	Label means fitness score of each mutant amino acid sequence.
	The wild‐type mutants receiving a score of one, larger value represents higher fitness.

	### Model input type
	Amino acid sequence

	### Performance
	0.60 Spearman's ρ

	### LoRA config
	lora_dropout: 0.0

	lora_alpha: 16

	target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"]

	modules_to_save: ["classifier"]

	### Training config
	class: AdamW

	betas: (0.9, 0.98)

	weight_decay: 0.01

	learning rate: 1e-4

	epoch: 100

	batch size: 2

	precision: 16-mixed