ChatterjeeLab
/

MemDLM

Model card Files Files and versions

MemDLM / README.md

VishrutThoutam's picture

Update readme

209ee0d verified over 1 year ago

|

2.09 kB

	---
	license: cc-by-nc-nd-4.0
	extra_gated_fields:
	Name: text
	Company: text
	Country: country
	Specific date: date_picker
	I want to use this model for:
	type: select
	options:
	- Research
	- Education
	- label: Other
	value: other
	I agree to share generated sequences and associated data with authors before publishing: checkbox
	I agree not to file patents on any sequences generated by this model: checkbox
	I agree to use this model for non-commercial use ONLY: checkbox
	base_model:
	- facebook/esm2_t30_150M_UR50D
	pipeline_tag: fill-mask
	---

	# MeMDLM: De Novo Membrane Protein Design with Masked Diffusion Language Models

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/65bbea9a26c639b000501321/uWW6xnJZwQFWDS1QZNQTm.png)

	Masked Diffusion Language Models (MDLMs), introduced by Sahoo et al (arxiv.org/pdf/2406.07524), provide strong generative capabilities to BERT-style models. In this work, we pre-train and fine-tune ESM-2-150M on the MDLM objective to scaffold functional motifs while unconditionally generating realistic, high-quality membrane protein sequences.

	## Model Usage

	The MDLM model leverages an internal backbone model, which is a fine-tune of ESM2 (150M). This backbone model can be used through this repo:

	```python
	from transformers import AutoTokenizer, AutoModelForMaskedLM

	tokenizer = AutoTokenizer.from_pretrained("ChatterjeeLab/MeMDLM")
	model = AutoModelForMaskedLM.from_pretrained("ChatterjeeLab/MeMDLM")

	input_sequence = "QMMALTFITYIGCGLSSIFLSVTLVILIQLCAALLLLNLIFLLDSWIALYnTRGFCIAVAVFLHYFLLVSFTWMGLEAFHMYLKFCIVGWGIPAVVVSIVLTISPDNYGidFCWINSNVVFYITVVGYFCVIFLLNVSMFIVVLVQLCRIKKKKQLGDL"

	inputs = tokenizer(input_sequence, return_tensors="pt")
	output = model(**inputs)

	filled_protein_seq = tokenizer.decode(output.squeeze()) # contains the output protein sequence with filled mask tokens
	```

	This backbone model can be integrated with the [MDLM formulation](https://github.com/kuleshov-group/mdlm) by setting the model backbone type to "hf_dit" and setting the HuggingFace Model ID to "ChatterjeeLab/MeMDLM"