hhhhhhh789 commited on
Commit
8d616ba
·
verified ·
1 Parent(s): c18323b

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +121 -3
README.md CHANGED
@@ -1,3 +1,121 @@
1
- ---
2
- license: cc-by-nc-sa-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MolPilot
2
+ Official implementation of ICML 2025 ["Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule"](https://arxiv.org/abs/2505.07286).
3
+
4
+ ![](../asset/molpilot_vos.png)
5
+
6
+ We propose VLB-Optimal Scheduling (VOS) and demonstrate its generality on the popular diffusion-based models (TargetDiff, with the code in `targetdiff` folder) and BFN-based models (our MolPilot).
7
+
8
+ In fact, VOS can be easily integrated into other frameworks, with only minor changes w.r.t. training:
9
+
10
+ ```python
11
+ # Example: TargetDiff molopt_score_model.py
12
+
13
+ class ScorePosNet3D(nn.Module):
14
+ def get_diffusion_loss(...):
15
+ ##### Original Training Loss #####
16
+ time_step, pt = self.sample_time(num_graphs, protein_pos.device, self.sample_time_method)
17
+ # Xt = a.sqrt() * X0 + (1-a).sqrt() * eps
18
+ ligand_pos_perturbed = a_pos.sqrt() * ligand_pos + (1.0 - a_pos).sqrt() * pos_noise # pos_noise * std
19
+
20
+ ##### VOS Generalized Loss #####
21
+ time_step_v, pt = self.sample_time(num_graphs, protein_pos.device, self.sample_time_method)
22
+ # Vt = a * V0 + (1-a) / K
23
+ log_ligand_v0 = index_to_log_onehot(ligand_v, self.num_classes)
24
+ ligand_v_perturbed, log_ligand_vt = self.q_v_sample(log_ligand_v0, time_step_v, batch_ligand)
25
+ kl_v = self.compute_v_Lt(log_v_model_prob=log_v_model_prob, log_v0=log_ligand_v0,
26
+ log_v_true_prob=log_v_true_prob, t=time_step_v, batch=batch_ligand)
27
+
28
+ ```
29
+
30
+ The optimal test-time noise schedule can be obtained by first storing the gridded loss surface values, and then running the dynamic programming script in `test/test_geodesic_budget.py`.
31
+
32
+ ![](../asset/molpilot_top1_bond_len_angle.png)
33
+
34
+ ## Environment
35
+ It is highly recommended to install via docker if a Linux server with NVIDIA GPU is available.
36
+
37
+ Otherwise, you might check [README for env](docker/README.md) for further details of docker or conda setup.
38
+
39
+ ### Prerequisite
40
+ A docker with `nvidia-container-runtime` enabled on your Linux system is required.
41
+
42
+ > [!TIP]
43
+ > - This repo provides an easy-to-use script to install docker and nvidia-container-runtime, in `./docker` run `sudo ./setup_docker_for_host.sh` to set up your host machine.
44
+ > - For details, please refer to the [install guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
45
+
46
+
47
+ ### Install via Docker
48
+ We highly recommend you to set up the environment via docker, since all you need to do is a simple `make` command.
49
+ ```bash
50
+ cd ./docker
51
+ make
52
+ ```
53
+
54
+ -----
55
+ ## Data
56
+ We use the same data as [TargetDiff](https://github.com/guanjq/targetdiff/tree/main?tab=readme-ov-file#data). Data used for training / evaluating the model should be put in the `data` folder by default, and accessible in the [data](https://drive.google.com/drive/folders/1j21cc7-97TedKh_El5E34yI8o5ckI7eK?usp=share_link) Google Drive folder.
57
+
58
+ To train the model from scratch, download the lmdb file and split file into data folder:
59
+ * `crossdocked_v1.1_rmsd1.0_pocket10_processed_final.lmdb`
60
+ * `crossdocked_pocket10_pose_split.pt`
61
+
62
+ To evaluate the model on the test set, download _and_ unzip the `test_set.zip` into data folder. It includes the original PDB files that will be used in Vina Docking.
63
+
64
+ ```yaml
65
+ data:
66
+ name: pl # [pl, pl_tr] where tr means offline-transformed
67
+ ```
68
+
69
+ ---
70
+ ## Training
71
+ ```bash
72
+ python train_bfn_twisted.py --exp_name ${EXP_NAME} --revision ${REVISION} --config_file configs/crossdock_train_test.yaml --time_decoupled
73
+ ```
74
+
75
+ where the default values should be set the same as:
76
+ ```bash
77
+ python train_bfn_twisted.py --sigma1_coord 0.05 --beta1 1.5 --beta1_bond 1.5 --lr 5e-4 --time_emb_dim 0 --self_condition --epochs 30 --batch_size 16 --max_grad_norm Q --scheduler plateau --destination_prediction True --use_discrete_t True --num_samples 10 --sampling_strategy end_back_pmf --sample_num_atoms ref --ligand_atom_mode add_aromatic
78
+ ```
79
+
80
+ ### Debugging
81
+ ```bash
82
+ python train_bfn_twisted.py --no_wandb --debug --epochs 1
83
+ ```
84
+
85
+ ## Sampling
86
+ We provide the pretrained MolPilot checkpoint [here](https://drive.google.com/file/d/1c-lD3yfRx6JlbTWq-jAdirrK6sK2lGLq/view?usp=share_link).
87
+
88
+
89
+ ### Sampling for pockets in the testset
90
+ To sample for CrossDock, set the `CONFIG` to `configs/crossdock_train_test.yaml`. For PoseBusters, set it to `configs/posebusters_test.yaml`.
91
+
92
+ ```bash
93
+ # Sample with time scheduler
94
+ python train_bfn_twisted.py --config_file ${CONFIG} --ckpt_path ${CKPT_PATH} --time_scheduler_path ${TIME_SCHEDULER} --test_only --exp_name ${EXP_NAME} --revision ${REVISION} --num_samples ${NUM_MOLS_PER_POCKET} --sample_steps 100 --eval_batch_size ${BATCH_SIZE}
95
+ ```
96
+
97
+ ### Sampling from pdb file
98
+ To sample from a whole protein pdb file, we need the corresponding reference ligand to clip the protein pocket (a 10A region around the reference position).
99
+
100
+ ```bash
101
+ python sample_for_pocket.py --protein_path ${PDB_PATH} --ligand_path ${SDF_PATH} --time_scheduler_path ${TIME_SCHEDULER} --num_samples ${NUM_MOLS_PER_POCKET}
102
+ ```
103
+
104
+ ## Evaluation
105
+
106
+ ### Evaluating meta files
107
+ We provide our samples as `molpilot_ref_vina_docked.pt` on CrossDock in the [sample](https://drive.google.com/drive/folders/1A3Mthm9ksbfUnMCe5T2noGsiEV1RfChH?usp=sharing) Google Drive folder.
108
+
109
+ <!-- TODO:, together with all the baseline results on PoseBusters in the [sample_posebusters]() folder. -->
110
+
111
+
112
+ ## Citation
113
+
114
+ ```
115
+ @article{qiu2025piloting,
116
+ title={Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule},
117
+ author={Qiu, Keyue and Song, Yuxuan and Fan, Zhehuan and Liu, Peidong and Zhang, Zhe and Zheng, Mingyue and Zhou, Hao and Ma, Wei-Ying},
118
+ journal={ICML 2025},
119
+ year={2025}
120
+ }
121
+ ```