|
|
--- |
|
|
license: cc-by-nc-nd-4.0 |
|
|
extra_gated_prompt: "By submitting any personal information (e.g., name, contact details), you agree to the collection and processing of this data |
|
|
for the purpose of evaluating access requests for this model. Repository authors will store this data securely and will not share it with third parties |
|
|
without your explicit consent. You retain all rights to your personal information and may request its deletion at any time.\n\n |
|
|
By accessing the repository you agree not to use this model in experiments which may result in harm to human or animal subjects. |
|
|
" |
|
|
extra_gated_fields: |
|
|
Date of Agreement: date_picker |
|
|
I accept the terms of the license and I agree not to use this model for commercial purposes or profit generation: checkbox |
|
|
tags: |
|
|
- molecular-generation |
|
|
- diffusion-models |
|
|
- cheminformatics |
|
|
- 3D-conformer |
|
|
- rdkit |
|
|
- non-commercial |
|
|
language: en |
|
|
library_name: mlconfgen |
|
|
datasets: |
|
|
- ChEMBL |
|
|
metrics: |
|
|
- shape-tanimoto |
|
|
- validity |
|
|
- uniqueness |
|
|
- novelty |
|
|
- Fréchet Distance |
|
|
model-index: |
|
|
- name: ML Conformer Generator |
|
|
results: |
|
|
- task: |
|
|
type: molecular-generation |
|
|
name: 3D Conformer Generation |
|
|
dataset: |
|
|
name: ChEMBL (filtered) |
|
|
type: molecules |
|
|
metrics: |
|
|
- name: Valid molecules |
|
|
type: validity |
|
|
value: 48-93% |
|
|
- name: Chemical novelty |
|
|
type: novelty |
|
|
value: 99.84% |
|
|
- name: Shape Tanimoto Similarity (avg) |
|
|
type: shape-tanimoto |
|
|
value: 53.32% |
|
|
- name: Shape Tanimoto Similarity (max) |
|
|
type: shape-tanimoto |
|
|
value: 99.69% |
|
|
- name: Average Synthesis Access score |
|
|
type: sa_score |
|
|
value: 3.18 |
|
|
- name: Unique molecules |
|
|
type: uniqueness |
|
|
value: 99.94% |
|
|
- name: Fréchet Fingerprint Distance |
|
|
type: Fréchet Distance |
|
|
value: 4.13 |
|
|
--- |
|
|
|
|
|
# ML Conformer Generator |
|
|
|
|
|
[](https://doi.org/10.1039/D5DD00318K) |
|
|
|
|
|
<img src="./mlconfgen_logo.png" width="200" style="display: block; margin: 0 10%;"> |
|
|
|
|
|
**ML Conformer Generator** is a shape-constrained molecule generation model that combines |
|
|
an Equivariant Diffusion Model (EDM) and Graph Convolutional Network (GCN). It generates 3D conformations |
|
|
that are chemically valid and geometrically aligned with a reference shape. |
|
|
|
|
|
--- |
|
|
|
|
|
## 📦 Model Summary |
|
|
|
|
|
- **Architecture**: Equivariant Diffusion Model (EDM) + Graph Convolutional Network (GCN) |
|
|
- **Training Data**: 1.6 million ChEMBL compounds, filtered for molecules with 15–39 heavy atoms |
|
|
- **Post-Processing**: Deterministic standardization pipeline using RDKit with constrained MMFF94 geometry optimization |
|
|
- **Primary Metric**: Shape Tanimoto Similarity |
|
|
- **Developed by:** Denis Sapegin |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚀 Intended Use |
|
|
|
|
|
- Non-Commercial Research in 3D molecular generation |
|
|
- Academic/educational use |
|
|
- Generation of molecules similar to a reference conformer |
|
|
- Generation of molecules similar to a reference arbitrary shape |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚫 Out of Scope / Limitations |
|
|
|
|
|
- **Commercial Use**: Not licensed for commercial use without explicit permission. |
|
|
- **Training Bias**: Trained on ChEMBL data — results may be biased toward drug-like molecules and chemistries. |
|
|
- **Elements Supported**: Only the following elements are supported for generation: `H`, `C`, `N`, `O`, `F`, `P`, `S`, `Cl`, `Br`. |
|
|
- **Molecular Size Limitations**: |
|
|
- Trained on molecules containing **15–39 heavy atoms**. |
|
|
- By architectural design, the model can **only generate molecules with up to 42 heavy atoms**. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧪 Evaluation Metrics (100,000 requested samples, 100 denoising steps) |
|
|
|
|
|
- ✅ **Valid molecules (post-standardization, % from requested)**: 48% |
|
|
- 🧬 **Chemical novelty**: 99.84% |
|
|
- 📐 **Avg Shape Tanimoto**: 53.32% |
|
|
- 🎯 **Max Shape Tanimoto**: 99.69% |
|
|
- 🔁 **Unique molecules**: 99.94% |
|
|
- ⚡ **Generation speed**: 4.18 valid molecules/sec (NVIDIA H100) |
|
|
- 💾 **Memory (per thread)**: up to 4.0 GB |
|
|
- 🧬 **Fréchet Fingerprint Distance (to ChEMBL)**: 4.13 |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧠 How It Works |
|
|
|
|
|
### Core Components: |
|
|
- **EDM** generates atom coordinates and types under shape constraints |
|
|
- **GCN** predicts adjacency matrices (bonding) |
|
|
- **RDKit** pipeline enforces valence, performs sanitization, and optimizes geometry |
|
|
|
|
|
### Shape Alignment: |
|
|
Evaluated using **Gaussian molecular volume overlap** and **Shape Tanimoto Similarity**. |
|
|
|
|
|
Hydrogens are excluded from similarity computation. |
|
|
|
|
|
--- |
|
|
|
|
|
## 💾 Access & Licensing |
|
|
|
|
|
The **Python package and inference code are available on GitHub** under Apache 2.0 License |
|
|
> https://github.com/Membrizard/ml_conformer_generator |
|
|
|
|
|
The trained model **Weights** are available at |
|
|
|
|
|
> https://huggingface.co/Membrizard/ml_conformer_generator |
|
|
|
|
|
And are licensed under CC BY-NC-ND 4.0 |
|
|
|
|
|
The usage of the trained weights for any profit-generating activity is restricted. |
|
|
|
|
|
For commercial licensing and inference-as-a-service, contact: |
|
|
[Denis Sapegin](https://github.com/Membrizard) |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use **MLConfGen** in your research, please cite: |
|
|
|
|
|
Denis Sapegin, Fedor Bakharev, Dmitry Krupenya, Azamat Gafurov, Konstantin Pildish, and Joseph C. Bear. |
|
|
*Moment of inertia as a simple shape descriptor for diffusion-based shape-constrained molecular generation.* |
|
|
Digital Discovery, 2025. |
|
|
DOI: [10.1039/D5DD00318K](https://doi.org/10.1039/D5DD00318K) |
|
|
|
|
|
--- |
|
|
|
|
|
## Installation |
|
|
|
|
|
1. Install the package: |
|
|
|
|
|
`pip install mlconfgen` |
|
|
|
|
|
2. Load the weights from Huggingface |
|
|
> https://huggingface.co/Membrizard/ml_conformer_generator |
|
|
|
|
|
**PyTorch** |
|
|
|
|
|
`edm_moi_chembl_15_39.pt` |
|
|
|
|
|
`adj_mat_seer_chembl_15_39.pt` |
|
|
|
|
|
**ONNX** |
|
|
|
|
|
`edm_moi_chembl_15_39.onnx` |
|
|
|
|
|
`adj_mat_seer_chembl_15_39.onnx` |
|
|
|
|
|
--- |
|
|
|
|
|
## 🐍 Python API |
|
|
|
|
|
**PyTorch** |
|
|
|
|
|
```python |
|
|
from rdkit import Chem |
|
|
from mlconfgen import MLConformerGenerator, evaluate_samples |
|
|
|
|
|
model = MLConformerGenerator( |
|
|
edm_weights="./edm_moi_chembl_15_39.pt", |
|
|
adj_mat_seer_weights="./adj_mat_seer_chembl_15_39.pt", |
|
|
diffusion_steps=100, |
|
|
) |
|
|
|
|
|
reference = Chem.MolFromMolFile('ceyyag.mol') |
|
|
|
|
|
samples = model.generate_conformers(reference_conformer=reference, n_samples=20, variance=2) |
|
|
|
|
|
aligned_reference, std_samples = evaluate_samples(reference, samples) |
|
|
``` |
|
|
--- |
|
|
|
|
|
**ONNX** |
|
|
|
|
|
```python |
|
|
from mlconfgen import MLConformerGeneratorONNX |
|
|
from rdkit import Chem |
|
|
|
|
|
model = MLConformerGeneratorONNX( |
|
|
egnn_onnx="./egnn_chembl_15_39.onnx", |
|
|
adj_mat_seer_onnx="./adj_mat_seer_chembl_15_39.onnx", |
|
|
diffusion_steps=100, |
|
|
) |
|
|
|
|
|
reference = Chem.MolFromMolFile('ceyyag.mol') |
|
|
samples = model.generate_conformers(reference_conformer=reference, n_samples=20, variance=2) |
|
|
|
|
|
``` |