Ace-Mol-light
Adaptive Chemical Embedding Model (ACE-Mol) is a task-specific chemical embedding model trained on a large collection of programatically generated chemical motifs.
ACE-Mol-light is a lighter pre-trained version of ACE-Mol. We recommend to use ACE-Mol-light for embedding and it is the default model loaded if you use our api.
Usage
ACE-Mol can be used out of the box for embedding of molecules or fine-tuned for specific task. All of the instructions and a simple api can be found in our github repo.
Clone the repo:
git clone https://github.com/lamalab-org/ACE-Mol.git
PretrainedACEMol helper class enables easy use of the pre-trained models from hf or a local finetuned model from a .ckpt file.
from src.pretrained import PretrainedACEMol
# Load pre-trained model (hf or local .ckpt)
acemol = PretrainedACEMol(jablonkagroup/ACEMol)
PretrainedACEMol accepts a list of SMILES, corresponding targets, and task descriptions (one task description is enough if it is shared).
molecules = [
'O=C(/C=C\\c1ccccc1)OCc1cncs1',
'CCC(C)C(CN(C)C)c1ccc(Cl)cc1Cl',
'CCOC(=O)CC(N)c1ccc(OC)cc1',
'CN(C)Cc1ccccc1O',
'COc1c(F)c(F)c(C(=O)Nc2ccccc2N2CCN(C(=O)C(C)C)CC2)c(F)c1F',
'O=C(COC(=O)c1ccccc1F)NCc1ccc2c(c1)OCO2'
]
task = 'is halogen group present'
targets = [0, 1, 0, 0, 1, 1]
We recommend using ACE-Mol as an embedding model; the embed method will create an embedding excluding the actual target and prepare a dataframe for classification or regression via logprobs.
embedded = acemol.embed(molecules, tasks, targets)
# split into train and test
train, test = embedded[:3], embedded[3:]
# use regress method for regression.
predictions = acemol.classify(train, test)
Cite
@article{prastalo2026learning,
title={Beyond Learning on Molecules by Weakly Supervising on Molecules},
author={Gordan Prastalo and Kevin Maik Jablonka},
journal={arXiv preprint arXiv:2602.04696},
year={2026}
}
- Downloads last month
- 107