Upload README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,121 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# MolPilot
|
| 2 |
+
Official implementation of ICML 2025 ["Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule"](https://arxiv.org/abs/2505.07286).
|
| 3 |
+
|
| 4 |
+

|
| 5 |
+
|
| 6 |
+
We propose VLB-Optimal Scheduling (VOS) and demonstrate its generality on the popular diffusion-based models (TargetDiff, with the code in `targetdiff` folder) and BFN-based models (our MolPilot).
|
| 7 |
+
|
| 8 |
+
In fact, VOS can be easily integrated into other frameworks, with only minor changes w.r.t. training:
|
| 9 |
+
|
| 10 |
+
```python
|
| 11 |
+
# Example: TargetDiff molopt_score_model.py
|
| 12 |
+
|
| 13 |
+
class ScorePosNet3D(nn.Module):
|
| 14 |
+
def get_diffusion_loss(...):
|
| 15 |
+
##### Original Training Loss #####
|
| 16 |
+
time_step, pt = self.sample_time(num_graphs, protein_pos.device, self.sample_time_method)
|
| 17 |
+
# Xt = a.sqrt() * X0 + (1-a).sqrt() * eps
|
| 18 |
+
ligand_pos_perturbed = a_pos.sqrt() * ligand_pos + (1.0 - a_pos).sqrt() * pos_noise # pos_noise * std
|
| 19 |
+
|
| 20 |
+
##### VOS Generalized Loss #####
|
| 21 |
+
time_step_v, pt = self.sample_time(num_graphs, protein_pos.device, self.sample_time_method)
|
| 22 |
+
# Vt = a * V0 + (1-a) / K
|
| 23 |
+
log_ligand_v0 = index_to_log_onehot(ligand_v, self.num_classes)
|
| 24 |
+
ligand_v_perturbed, log_ligand_vt = self.q_v_sample(log_ligand_v0, time_step_v, batch_ligand)
|
| 25 |
+
kl_v = self.compute_v_Lt(log_v_model_prob=log_v_model_prob, log_v0=log_ligand_v0,
|
| 26 |
+
log_v_true_prob=log_v_true_prob, t=time_step_v, batch=batch_ligand)
|
| 27 |
+
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
The optimal test-time noise schedule can be obtained by first storing the gridded loss surface values, and then running the dynamic programming script in `test/test_geodesic_budget.py`.
|
| 31 |
+
|
| 32 |
+

|
| 33 |
+
|
| 34 |
+
## Environment
|
| 35 |
+
It is highly recommended to install via docker if a Linux server with NVIDIA GPU is available.
|
| 36 |
+
|
| 37 |
+
Otherwise, you might check [README for env](docker/README.md) for further details of docker or conda setup.
|
| 38 |
+
|
| 39 |
+
### Prerequisite
|
| 40 |
+
A docker with `nvidia-container-runtime` enabled on your Linux system is required.
|
| 41 |
+
|
| 42 |
+
> [!TIP]
|
| 43 |
+
> - This repo provides an easy-to-use script to install docker and nvidia-container-runtime, in `./docker` run `sudo ./setup_docker_for_host.sh` to set up your host machine.
|
| 44 |
+
> - For details, please refer to the [install guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
### Install via Docker
|
| 48 |
+
We highly recommend you to set up the environment via docker, since all you need to do is a simple `make` command.
|
| 49 |
+
```bash
|
| 50 |
+
cd ./docker
|
| 51 |
+
make
|
| 52 |
+
```
|
| 53 |
+
|
| 54 |
+
-----
|
| 55 |
+
## Data
|
| 56 |
+
We use the same data as [TargetDiff](https://github.com/guanjq/targetdiff/tree/main?tab=readme-ov-file#data). Data used for training / evaluating the model should be put in the `data` folder by default, and accessible in the [data](https://drive.google.com/drive/folders/1j21cc7-97TedKh_El5E34yI8o5ckI7eK?usp=share_link) Google Drive folder.
|
| 57 |
+
|
| 58 |
+
To train the model from scratch, download the lmdb file and split file into data folder:
|
| 59 |
+
* `crossdocked_v1.1_rmsd1.0_pocket10_processed_final.lmdb`
|
| 60 |
+
* `crossdocked_pocket10_pose_split.pt`
|
| 61 |
+
|
| 62 |
+
To evaluate the model on the test set, download _and_ unzip the `test_set.zip` into data folder. It includes the original PDB files that will be used in Vina Docking.
|
| 63 |
+
|
| 64 |
+
```yaml
|
| 65 |
+
data:
|
| 66 |
+
name: pl # [pl, pl_tr] where tr means offline-transformed
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
---
|
| 70 |
+
## Training
|
| 71 |
+
```bash
|
| 72 |
+
python train_bfn_twisted.py --exp_name ${EXP_NAME} --revision ${REVISION} --config_file configs/crossdock_train_test.yaml --time_decoupled
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
where the default values should be set the same as:
|
| 76 |
+
```bash
|
| 77 |
+
python train_bfn_twisted.py --sigma1_coord 0.05 --beta1 1.5 --beta1_bond 1.5 --lr 5e-4 --time_emb_dim 0 --self_condition --epochs 30 --batch_size 16 --max_grad_norm Q --scheduler plateau --destination_prediction True --use_discrete_t True --num_samples 10 --sampling_strategy end_back_pmf --sample_num_atoms ref --ligand_atom_mode add_aromatic
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
### Debugging
|
| 81 |
+
```bash
|
| 82 |
+
python train_bfn_twisted.py --no_wandb --debug --epochs 1
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
## Sampling
|
| 86 |
+
We provide the pretrained MolPilot checkpoint [here](https://drive.google.com/file/d/1c-lD3yfRx6JlbTWq-jAdirrK6sK2lGLq/view?usp=share_link).
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
### Sampling for pockets in the testset
|
| 90 |
+
To sample for CrossDock, set the `CONFIG` to `configs/crossdock_train_test.yaml`. For PoseBusters, set it to `configs/posebusters_test.yaml`.
|
| 91 |
+
|
| 92 |
+
```bash
|
| 93 |
+
# Sample with time scheduler
|
| 94 |
+
python train_bfn_twisted.py --config_file ${CONFIG} --ckpt_path ${CKPT_PATH} --time_scheduler_path ${TIME_SCHEDULER} --test_only --exp_name ${EXP_NAME} --revision ${REVISION} --num_samples ${NUM_MOLS_PER_POCKET} --sample_steps 100 --eval_batch_size ${BATCH_SIZE}
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
### Sampling from pdb file
|
| 98 |
+
To sample from a whole protein pdb file, we need the corresponding reference ligand to clip the protein pocket (a 10A region around the reference position).
|
| 99 |
+
|
| 100 |
+
```bash
|
| 101 |
+
python sample_for_pocket.py --protein_path ${PDB_PATH} --ligand_path ${SDF_PATH} --time_scheduler_path ${TIME_SCHEDULER} --num_samples ${NUM_MOLS_PER_POCKET}
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
## Evaluation
|
| 105 |
+
|
| 106 |
+
### Evaluating meta files
|
| 107 |
+
We provide our samples as `molpilot_ref_vina_docked.pt` on CrossDock in the [sample](https://drive.google.com/drive/folders/1A3Mthm9ksbfUnMCe5T2noGsiEV1RfChH?usp=sharing) Google Drive folder.
|
| 108 |
+
|
| 109 |
+
<!-- TODO:, together with all the baseline results on PoseBusters in the [sample_posebusters]() folder. -->
|
| 110 |
+
|
| 111 |
+
|
| 112 |
+
## Citation
|
| 113 |
+
|
| 114 |
+
```
|
| 115 |
+
@article{qiu2025piloting,
|
| 116 |
+
title={Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule},
|
| 117 |
+
author={Qiu, Keyue and Song, Yuxuan and Fan, Zhehuan and Liu, Peidong and Zhang, Zhe and Zheng, Mingyue and Zhou, Hao and Ma, Wei-Ying},
|
| 118 |
+
journal={ICML 2025},
|
| 119 |
+
year={2025}
|
| 120 |
+
}
|
| 121 |
+
```
|