Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,156 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-nc-sa-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-sa-2.0
|
| 3 |
+
---
|
| 4 |
+
<div align='center'>
|
| 5 |
+
<h1> MolCRAFT Series for Drug Design </h1>
|
| 6 |
+
|
| 7 |
+
[](https://github.com/AlgoMole/MolCRAFT/tree/master)
|
| 8 |
+
[](https://MolCRAFT-GenSI.github.io/)
|
| 9 |
+
[](https://drive.google.com/drive/folders/16KiwfMGUIk4a6mNU20GnUd0ah-mjNlhC?usp=share_link)
|
| 10 |
+
|
| 11 |
+
</div>
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
Welcome to the official repository for the MolCRAFT series of projects! This series focuses on developing and improving deep learning models for **structure-based drug design (SBDD)** and **molecule optimization (SBMO)**. Our goal is to create molecules with high binding affinity and plausible 3D conformations.
|
| 15 |
+
|
| 16 |
+
This repository contains the source code for the following projects:
|
| 17 |
+
|
| 18 |
+
* [**MolCRAFT**: Structure-Based Drug Design in Continuous Parameter Space](https://arxiv.org/abs/2404.12141) (ICML'24)
|
| 19 |
+
* [**MolJO**: Empower Structure-Based Molecule Optimization with Gradient Guidance](https://arxiv.org/abs/2411.13280) (ICML'25)
|
| 20 |
+
* [**MolPilot**: Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule](https://arxiv.org/abs/2505.07286) (ICML'25)
|
| 21 |
+
|
| 22 |
+
## 📜 Overview
|
| 23 |
+
|
| 24 |
+
The MolCRAFT series addresses critical challenges in generative models for SBDD, including modeling molecular geometries, handling hybrid continuous-discrete spaces, and optimizing molecules against protein targets. Each project introduces novel methodologies and achieves **state-of-the-art** performance on relevant benchmarks.
|
| 25 |
+
|
| 26 |
+
## 🧭 Navigation
|
| 27 |
+
|
| 28 |
+
| Folder | TL, DR | Description |
|
| 29 |
+
| --------------------------- | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
| 30 |
+
| [MolCRAFT](./MolCRAFT/) | Unified Space for Molecule Generation | MolCRAFT is the first SBDD generative model based on Bayesian Flow Network (BFN) operating in the unified continuous parameter space for different modalities, with variance reduction sampling strategy to generate high-quality samples with more than 10x speedup.
|
| 31 |
+
| [MolJO](./MolJO/) | Gradient-Guided Molecule Optimization | MolJO is a gradient-based Structure-Based Molecule Optimization (SBMO) framework derived within BFN. It employs joint guidance across continuous coordinates and discrete atom types, alongside a backward correction strategy for effective optimization.
|
| 32 |
+
| [MolPilot](./MolPilot/) | Optimal Scheduling | MolPilot enhances SBDD by introducing a VLB-Optimal Scheduling (VOS) strategy for the twisted multimodal probability paths, significantly improving molecular geometries and interaction modeling, achieving 95.9% PB-Valid rate. |
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
---
|
| 36 |
+
|
| 37 |
+
## 🚀 Projects
|
| 38 |
+
|
| 39 |
+
### MolCRAFT (Let's Craft the Molecules)
|
| 40 |
+
|
| 41 |
+
<p align="center"><img src="asset/molcraft.gif" width="60%"></p>
|
| 42 |
+
|
| 43 |
+
* **Description**: MolCRAFT is the first SBDD model that employs BFN and operates in a **continuous parameter space**. It introduces a novel noise-reduced sampling strategy to generate molecules with superior binding affinity and more stable 3D structures. MolCRAFT has demonstrated its ability to accurately model interatomic interactions, achieving reference-level Vina Scores.
|
| 44 |
+
* **Key Contributions**:
|
| 45 |
+
* Operates in continuous parameter space for SBDD within BFN framework.
|
| 46 |
+
* Novel variance reduction sampling strategy that improves both sample quality and efficiency.
|
| 47 |
+
* Achieves state-of-the-art binding affinity and structural stability.
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
### MolJO (Molecule Joint Optimization)
|
| 52 |
+
|
| 53 |
+

|
| 54 |
+
|
| 55 |
+
* **Description**: MolJO is a gradient-based SBMO framework that leverages a continuous and differentiable space derived through Bayesian inference. It facilitates **joint guidance signals across different modalities** (continuous coordinates and discrete atom types) while preserving SE(3)-equivariance. MolJO introduces a novel backward correction strategy for an effective trade-off between exploration and exploitation.
|
| 56 |
+
* **Key Contributions**:
|
| 57 |
+
* Gradient-based SBMO framework with joint guidance across different modalities.
|
| 58 |
+
* Backward correction strategy for optimized exploration-exploitation.
|
| 59 |
+
* State-of-the-art performance in practical optimization tasks, including multi-objective and constrained optimization for R-group redesign, scaffold hopping, etc.
|
| 60 |
+
|
| 61 |
+
---
|
| 62 |
+
|
| 63 |
+
### MolPilot (How to Pilot the Aircraft)
|
| 64 |
+
|
| 65 |
+

|
| 66 |
+
|
| 67 |
+
* **Description**: MolPilot addresses challenges in geometric structure modeling by focusing on the **twisted probability path of multi-modalities** (continuous 3D positions and discrete 2D topologies). It proposes a VLB-Optimal Scheduling (VOS) strategy, optimizing the Variational Lower Bound as a path integral for SBDD. MolPilot significantly enhances molecular geometries and interaction modeling.
|
| 68 |
+
* **Key Contributions**:
|
| 69 |
+
* Addresses multi-modality challenges in SBDD.
|
| 70 |
+
* Introduces VLB-Optimal Scheduling (VOS) strategy, generally applicable to a wide range of frameworks including diffusions.
|
| 71 |
+
* Achieves 95.9% PoseBusters passing rate on CrossDock with significantly improved molecular geometries.
|
| 72 |
+
|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
# MolJO
|
| 76 |
+
Official implementation of ICML 2025 ["Empower Structure-Based Molecule Optimization with Gradient Guidance"](https://arxiv.org/abs/2411.13280).
|
| 77 |
+
|
| 78 |
+

|
| 79 |
+
|
| 80 |
+
## Environment
|
| 81 |
+
It is highly recommended to install via docker if a Linux server with NVIDIA GPU is available.
|
| 82 |
+
|
| 83 |
+
Otherwise, you might check [README for env](docker/README.md) for further details of docker or conda setup.
|
| 84 |
+
|
| 85 |
+
### Prerequisite
|
| 86 |
+
A docker with `nvidia-container-runtime` enabled on your Linux system is required.
|
| 87 |
+
|
| 88 |
+
> [!TIP]
|
| 89 |
+
> - This repo provides an easy-to-use script to install docker and nvidia-container-runtime, in `./docker` run `sudo ./setup_docker_for_host.sh` to set up your host machine.
|
| 90 |
+
> - For details, please refer to the [install guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
### Install via Docker
|
| 94 |
+
We highly recommend you to set up the environment via docker, since all you need to do is a simple `make` command.
|
| 95 |
+
```bash
|
| 96 |
+
cd ./docker
|
| 97 |
+
make
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
-----
|
| 101 |
+
## Data
|
| 102 |
+
We use the same CrossDock dataset as previous approaches with affinity info (Vina Score). Data used for training / evaluating the model is obtained from [KGDiff](https://github.com/CMACH508/KGDiff/tree/main?tab=readme-ov-file), and should be put in the `data` folder.
|
| 103 |
+
|
| 104 |
+
To train the property predictor from scratch, extract the files from the `data.zip` in [Zenodo](https://zenodo.org/records/8419944):
|
| 105 |
+
* `crossdocked_v1.1_rmsd1.0_pocket10_processed_final.lmdb`
|
| 106 |
+
* `crossdocked_pocket10_pose_split.pt`
|
| 107 |
+
|
| 108 |
+
To evaluate the model on the test set, download _and_ unzip the `test_set.zip` into `data` folder. It includes the original PDB files that will be used in Vina Docking.
|
| 109 |
+
|
| 110 |
+
---
|
| 111 |
+
## Training
|
| 112 |
+
```bash
|
| 113 |
+
python train_classifier.py --exp_name ${EXP_NAME} --revision ${REVISION} --prop_name ${PROPERTY} # affinity qed sa
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
where the other default values should be set the same as:
|
| 117 |
+
```bash
|
| 118 |
+
python train_bfn.py --config_file configs/train_prop.yaml --sigma1_coord 0.03 --beta1 1.5 --lr 5e-4 --time_emb_dim 1 --epochs 15 --max_grad_norm Q --destination_prediction True --use_discrete_t True
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
## Sampling
|
| 122 |
+
We provide the pretrained checkpoints for property predictors (Vina Score, SA) in the [pretrained](https://drive.google.com/drive/folders/12t90e-gHBbYn3tFOFIENZc0mZYFhZuX2?usp=share_link) Google Drive folder. The backbone checkpoint can be found [here](https://drive.google.com/file/d/1TcUQM7Lw1klH2wOVBu20cTsvBTcC1WKu/view?usp=share_link). After downloading them, please put the checkpoints under the `pretrained` folder.
|
| 123 |
+
|
| 124 |
+
### Sampling for pockets in the testset
|
| 125 |
+
```bash
|
| 126 |
+
python sample_guided.py --num_samples ${NUM_MOLS_PER_POCKET} --objective ${OBJ} # vina_sa
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
where the other default values should be set the same as:
|
| 130 |
+
```bash
|
| 131 |
+
python sample_guided.py --config_file configs/test_opt.yaml --pos_grad_weight 50 --type_grad_weight 50 --guide_mode param_naive --sample_steps 200 --sample_num_atoms prior
|
| 132 |
+
```
|
| 133 |
+
|
| 134 |
+
### Sampling from pdb file
|
| 135 |
+
To sample from a whole protein pdb file, we need the corresponding reference ligand to clip the protein pocket (a 10A region around the reference position).
|
| 136 |
+
|
| 137 |
+
```bash
|
| 138 |
+
python sample_for_pocket_guided.py --protein_path ${PDB_PATH} --ligand_path ${SDF_PATH}
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
## Evaluation
|
| 142 |
+
|
| 143 |
+
### Evaluating meta files
|
| 144 |
+
We provide our samples as `moljo_vina_sa_vina_docked_pose_checked.pt` in the [sample](https://drive.google.com/drive/folders/1A3Mthm9ksbfUnMCe5T2noGsiEV1RfChH?usp=sharing) Google Drive folder.
|
| 145 |
+
|
| 146 |
+
|
| 147 |
+
## Citation
|
| 148 |
+
|
| 149 |
+
```
|
| 150 |
+
@article{qiu2025empower,
|
| 151 |
+
title={Empower Structure-Based Molecule Optimization with Gradient Guidance},
|
| 152 |
+
author={Qiu, Keyue and Song, Yuxuan and Yu, Jie and Ma, Hongbo and Cao, Ziyao and Zhang, Zhilong and Wu, Yushuai and Zheng, Mingyue and Zhou, Hao and Ma, Wei-Ying},
|
| 153 |
+
journal={ICML 2025},
|
| 154 |
+
year={2025}
|
| 155 |
+
}
|
| 156 |
+
```
|