lamthuy
/

MorganGen

Model card Files Files and versions

MorganGen / README.md

lamthuy's picture

Upload folder using huggingface_hub

f0a96d3 verified 7 months ago

|

history blame contribute delete

2.01 kB

	---
	license: mit
	---
	## MorganGen
	To use it you can clone the HF gitrepo before running the following examples:

	```commandline
	git lfs install # Only once if not done already
	git clone https://huggingface.co/lamthuy/MorganGen
	```

	A generative model trained on 120 million SMILES strings from the ZINC database. The model takes as input a sequence of indices representing the active bits in a 2048-bit Morgan fingerprint. Each index corresponds to a bit set to 1, while all other bits are 0.
	```
	s = [12][184][1200]
	```
	represents a fingerprint where only bits 12, 184, and 1200 are set to 1, and the remaining bits are 0.
	# Running example
	The following code snippet in the notebook demonstrates how to load the model from a checkpoint and generate a new SMILES string, conditioned on a given input SMILES.

	```python
	from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
	from utils import MorganFingerprint, morgan_fingerprint_to_text


	# Load the checkpoint and the tokenizer
	checkpoint_path = "lamthuy/MorganGen"
	model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint_path)
	tokenizer = AutoTokenizer.from_pretrained(checkpoint_path)

	# Given a SMILES, get its fingerpint
	smiles = "CC(=O)OC1=CC=CC=C1C(=O)O"
	m = MorganFingerprint()
	mf = m.smiles_to_morgan(smiles)

	# convert it to the indices text format
	s = morgan_fingerprint_to_text(mf)

	# encode
	input_ids = tokenizer.encode(s, return_tensors="pt")
	# Generate output sequence
	output_ids = model.generate(input_ids, max_length=64, num_beams=5)

	# Decode the generated output
	output_smiles = tokenizer.decode(output_ids[0], skip_special_tokens=True)
	print(output_smiles)

	```

	# Reference
	```
	@inproceedings{hoang2024morgangen,
	title={MorganGen: Generative Modeling of SMILES Using Morgan Fingerprint Features},
	author={Hoang, Lam Thanh and D{\'\i}az, Ra{\'u}l Fern{\'a}ndez and Lopez, Vanessa},
	booktitle={American Chemical Society (ACS) Fall Meeting},
	year={2024}
	}

	```