|
|
---
|
|
|
license: mit
|
|
|
---
|
|
|
## MorganGen
|
|
|
To use it you can clone the HF gitrepo before running the following examples:
|
|
|
|
|
|
```commandline
|
|
|
git lfs install # Only once if not done already
|
|
|
git clone https://huggingface.co/lamthuy/MorganGen
|
|
|
```
|
|
|
|
|
|
A generative model trained on 120 million SMILES strings from the ZINC database. The model takes as input a sequence of indices representing the active bits in a 2048-bit Morgan fingerprint. Each index corresponds to a bit set to 1, while all other bits are 0.
|
|
|
```
|
|
|
s = [12][184][1200]
|
|
|
```
|
|
|
represents a fingerprint where only bits 12, 184, and 1200 are set to 1, and the remaining bits are 0.
|
|
|
# Running example
|
|
|
The following code snippet in the notebook demonstrates how to load the model from a checkpoint and generate a new SMILES string, conditioned on a given input SMILES.
|
|
|
|
|
|
```python
|
|
|
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
|
|
|
from utils import MorganFingerprint, morgan_fingerprint_to_text
|
|
|
|
|
|
|
|
|
# Load the checkpoint and the tokenizer
|
|
|
checkpoint_path = "lamthuy/MorganGen"
|
|
|
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint_path)
|
|
|
tokenizer = AutoTokenizer.from_pretrained(checkpoint_path)
|
|
|
|
|
|
# Given a SMILES, get its fingerpint
|
|
|
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O"
|
|
|
m = MorganFingerprint()
|
|
|
mf = m.smiles_to_morgan(smiles)
|
|
|
|
|
|
# convert it to the indices text format
|
|
|
s = morgan_fingerprint_to_text(mf)
|
|
|
|
|
|
# encode
|
|
|
input_ids = tokenizer.encode(s, return_tensors="pt")
|
|
|
# Generate output sequence
|
|
|
output_ids = model.generate(input_ids, max_length=64, num_beams=5)
|
|
|
|
|
|
# Decode the generated output
|
|
|
output_smiles = tokenizer.decode(output_ids[0], skip_special_tokens=True)
|
|
|
print(output_smiles)
|
|
|
|
|
|
```
|
|
|
|
|
|
# Reference
|
|
|
```
|
|
|
@inproceedings{hoang2024morgangen,
|
|
|
title={MorganGen: Generative Modeling of SMILES Using Morgan Fingerprint Features},
|
|
|
author={Hoang, Lam Thanh and D{\'\i}az, Ra{\'u}l Fern{\'a}ndez and Lopez, Vanessa},
|
|
|
booktitle={American Chemical Society (ACS) Fall Meeting},
|
|
|
year={2024}
|
|
|
}
|
|
|
|
|
|
```
|
|
|
|