zjunlp
/

MolGen-large

text2text-generation

molecular language model

molecule generation

Model card Files Files and versions

Yin Fang commited on Feb 1, 2023

Commit

17d47f2

·

1 Parent(s): 936e738

Update README.md

Files changed (1) hide show

README.md +15 -1

README.md CHANGED Viewed

@@ -13,9 +13,23 @@ With a training corpus of over 100 million molecules in SELFIES representation,
 Specifically, MolGen employs a bidirectional Transformer as its encoder and an autoregressive Transformer as its decoder.
 Through its carefully designed multi-task molecular prefix tuning (MPT), MolGen can generate molecules with desired properties, making it a valuable tool for molecular optimization.
-## Intended uses & limitations
 You can use the raw model for molecular generation or fine-tune it to a downstream task. See the [repository](https://github.com/zjunlp/MolGen) to look for fine-tune details on a task that interests you.
 ### BibTeX entry and citation info
 ```bibtex
 @article{fang2023molecular,

 Specifically, MolGen employs a bidirectional Transformer as its encoder and an autoregressive Transformer as its decoder.
 Through its carefully designed multi-task molecular prefix tuning (MPT), MolGen can generate molecules with desired properties, making it a valuable tool for molecular optimization.
+## Intended uses
 You can use the raw model for molecular generation or fine-tune it to a downstream task. See the [repository](https://github.com/zjunlp/MolGen) to look for fine-tune details on a task that interests you.
+### How to use
+Molecule generation example:
+```python
+from transformers import AutoTokenizer, BartForConditionalGeneration
+tokenizer = AutoTokenizer.from_pretrained("zjunlp/MolGen")
+model = BartForConditionalGeneration.from_pretrained("zjunlp/MolGen", use_auth_token=True)
+sf_input = tokenizer("[C][=C][C][=C][C][=C][Ring1][=Branch1]", return_tensors="pt")
+molecules = model.generate(input_ids=sf_input["input_ids"],attention_mask=sf_input["attention_mask"],max_length=20,min_length=5,num_return_sequences=5,num_beams=5,past_prompt=None)
+sf_output = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=True).replace(" ","") for g in molecules]
+```
 ### BibTeX entry and citation info
 ```bibtex
 @article{fang2023molecular,