|
|
--- |
|
|
license: cc-by-nc-nd-4.0 |
|
|
extra_gated_fields: |
|
|
Name: text |
|
|
Company: text |
|
|
Country: country |
|
|
Specific date: date_picker |
|
|
I want to use this model for: |
|
|
type: select |
|
|
options: |
|
|
- Research |
|
|
- Education |
|
|
- label: Other |
|
|
value: other |
|
|
I agree to share generated sequences and associated data with authors before publishing: checkbox |
|
|
I agree not to file patents on any sequences generated by this model: checkbox |
|
|
I agree to use this model for non-commercial use ONLY: checkbox |
|
|
base_model: |
|
|
- facebook/esm2_t30_150M_UR50D |
|
|
pipeline_tag: fill-mask |
|
|
--- |
|
|
|
|
|
# MeMDLM: De Novo Membrane Protein Design with Masked Diffusion Language Models |
|
|
|
|
|
 |
|
|
|
|
|
Masked Diffusion Language Models (MDLMs), introduced by Sahoo et al (arxiv.org/pdf/2406.07524), provide strong generative capabilities to BERT-style models. In this work, we pre-train and fine-tune ESM-2-150M on the MDLM objective to scaffold functional motifs while unconditionally generating realistic, high-quality membrane protein sequences. |
|
|
|
|
|
## Model Usage |
|
|
|
|
|
The MDLM model leverages an internal backbone model, which is a fine-tune of ESM2 (150M). This backbone model can be used through this repo: |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForMaskedLM |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("ChatterjeeLab/MeMDLM") |
|
|
model = AutoModelForMaskedLM.from_pretrained("ChatterjeeLab/MeMDLM") |
|
|
|
|
|
input_sequence = "QMMALTFITYIGCGLSSIFLSVTLVILIQLCAALLLLNLIFLLDSWIALYnTRGFCIAVAVFLHYFLLVSFTWMGLEAFHMYLKFCIVGWGIPAVVVSIVLTISPDNYGidFCWINSNVVFYITVVGYFCVIFLLNVSMFIVVLVQLCRIKKKKQLGDL" |
|
|
|
|
|
inputs = tokenizer(input_sequence, return_tensors="pt") |
|
|
output = model(**inputs) |
|
|
|
|
|
filled_protein_seq = tokenizer.decode(output.squeeze()) # contains the output protein sequence with filled mask tokens |
|
|
``` |
|
|
|
|
|
This backbone model can be integrated with the [MDLM formulation](https://github.com/kuleshov-group/mdlm) by setting the model backbone type to "hf_dit" and setting the HuggingFace Model ID to "ChatterjeeLab/MeMDLM" |