Update model card README.md

#1
by ejin700 - opened
Files changed (1) hide show
  1. README.md +162 -3
README.md CHANGED
@@ -1,3 +1,162 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <p align="center">
2
+ <img src="assets/oxtal-logo.png" alt="OXtal: Generative Molecular Crystal Structure Prediction" width="900"/><br>
3
+ <a href="https://arxiv.org/abs/2512.06987"><img src="https://img.shields.io/badge/arXiv-94133F?style=for-the-badge&logo=arxiv" alt="arXiv"/></a>
4
+ <a href="https://oxtal.github.io/"><img src="https://img.shields.io/badge/πŸ“%20Blog-007A87?style=for-the-badge&logoColor=white" alt="Blog"/></a>
5
+ <a href="https://huggingface.co/OXtal-CSP"><img src="https://img.shields.io/badge/HuggingFace-DE9B35.svg?style=for-the-badge&logo=HuggingFace" alt="HF"/></a>
6
+ </p>
7
+
8
+ **OX**tal (**O**rganic **X** "Crys-" tal) is an all-atom diffusion model for molecular crystal structure prediction (CSP). Unlike traditional quantum-chemical approaches, which rely on expensive energy oracles, OXtal generates fast and accurate zero-shot predictions at a fraction of the cost. Specifically, OXtal recovers experimental crystal structures for both rigid and flexible molecules, as well as co-crystals, with conformer RMSD1 < 0.5 Γ… and attains over 80% packing similarity rate, demonstrating its ability to model both thermodynamic and kinetic regularities of molecular crystallization.
9
+
10
+ ## βš™οΈ Installation and Setup
11
+ OXtal was developed with Python 3.11.0, CUDA 12.6, and PyTorch 2.5.0, but you may need to adjust these accordingly to match your own compute resources. To set up the environment, run the following commands from the top-level `OXtal` directory, which should create the `oxtal-env` environment in `.venv/`
12
+
13
+ ```
14
+ # Install uv
15
+ curl -LsSf https://astral.sh/uv/install.sh | sh
16
+
17
+ # Install oxtal-env
18
+ uv sync
19
+ ```
20
+
21
+ After installing the environment through `uv`, you can activate it using:
22
+ ```
23
+ source .venv/bin/activate
24
+ ```
25
+
26
+ ### [Optional] CUTLASS Installation
27
+ OXtal also supports using DeepSpeed4Science EvoformerAttention for memory-efficient attention, which significantly reduces GPU memory usage and enables inference on longer sequences. This requires NVIDIA CUTLASS to be available on disk and a GPU with Ampere or newer architecture (e.g. A100, L40S, H100, H200, B100, B200). To enable this functionality, add `deepspeed>=0.18.3` to the list of dependencies in `pyproject.toml`, and update your environment using the steps above. CUTLASS can be installed as follows:
28
+
29
+ ```
30
+ # First clone the cutlass repo
31
+ git clone -b v3.5.1 https://github.com/NVIDIA/cutlass.git
32
+
33
+ # Then, set the environment variable CUTLASS_PATH to point there
34
+ cd cutlass
35
+ pwd
36
+ export CUTLASS_PATH=<path_from_pwd>
37
+ ```
38
+
39
+ You can also add `CUTLASS_PATH` to your shell profile so it persists across sessions. The attention kernels will be compiled the first time they are invoked. To invoke evoformer attention during inference, remove `use_deepspeed_evo_attention=false` from `run_inference.sh`.
40
+
41
+ ## πŸš€ Inference
42
+
43
+ This project uses [hydra](https://hydra.cc/) to manage model configuration files, which allows easy command-line overrides and structured configs. You can find all the configuration files in the `configs` folder. We also use HuggingFace in order to manage our model checkpoint and data files.
44
+
45
+ To generate samples with OXtal, run the following command:
46
+ ```
47
+ # Run OXtal inference:
48
+ bash run_inference.sh
49
+ ```
50
+
51
+ To run inference on different sets of molecules, simply update the `input_json_path` parameter in `run_inference.sh`. We have provided all of our evaluation datasets in the `examples` folder. To run inference on all 5 evaluation datasets from the paper together, use the `examples/all_inference.json` file.
52
+
53
+ | Parameter | Description |
54
+ |---|---|
55
+ | `input_json_path` | Path to the input JSON file detailings which crystals to generate. |
56
+ | `sample_diffusion.N_sample` | Number of samples to generate. For example, num_inference_seeds=10 produces 10 samples per job. |
57
+ | `seeds` | List of random seeds, e.g. `[0,1,2]`. Each seed produces `sample_diffusion.N_sample` outputs for each crystal in the input JSON, so the total number of generated samples equals `len(seeds) * sample_diffusion.N_sample`. |
58
+ | `dump_dir` | Output directory for generated structures. |
59
+ | `use_deepspeed_evo_attention` | Flag to enable/disable EvoformerAttention. |
60
+
61
+
62
+ You can also run OXtal with your own molecules by adding a new .json file to the `examples` folder. The code supports both SMILES strings as well as input .sdf files (specified by adding `FILE_` prefix to the file name). For co-crystals, all component parts must be specified individually, with the desired ratios provided. Example .json entries are provided below for reference:
63
+
64
+ ```
65
+ [
66
+ {
67
+ "sequences": [
68
+ {
69
+ "ligand": {
70
+ "ligand": "CC1=CC(C#N)=C(S1)Nc2c([N+]([O-])=O)cccc2", # SMILES string
71
+ "count": 30, # How many copies of the molecule to generate
72
+ "id_key": "ligand"
73
+ }
74
+ }
75
+ ],
76
+ "modelSeeds": [],
77
+ "assembly_id": "1",
78
+ "name": "QAXMEH"
79
+ },
80
+ {
81
+ "sequences": [
82
+ {
83
+ "ligand": {
84
+ "ligand": "N#CC(C#N)=C1C=CC(C=C1)=C(C#N)C#N",
85
+ "count": 30,
86
+ "id_key": "ligand"
87
+ }
88
+ },
89
+ {
90
+ "ligand": {
91
+ "ligand": "c1cc2cccc3c4cccc5cccc(c(c1)c23)c45",
92
+ "count": 30,
93
+ "id_key": "ligand"
94
+ }
95
+ }
96
+ ],
97
+ "modelSeeds": [],
98
+ "assembly_id": "1",
99
+ "name": "PERTCQ01" # Co-Crystal with 1:1 ratio
100
+ },
101
+ {
102
+ "sequences": [
103
+ {
104
+ "ligand": {
105
+ "ligand": "FILE_./examples/BIPY.sdf", # Input .sdf instead of SMILES string
106
+ "count": 30,
107
+ "id_key": "ligand"
108
+ }
109
+ }
110
+ ],
111
+ "modelSeeds": [],
112
+ "assembly_id": "1",
113
+ "name": "UWEQUL"
114
+ },
115
+ ...
116
+ ]
117
+ ```
118
+
119
+ ## πŸ“Š Evaluation
120
+ We use the COMPACK software from CCDC to compare OXtal generated crystal packings to experimentally observed structures in our test datasets. You can install the `csd-python-api` into your envioronment using the following command:
121
+
122
+ ```
123
+ uv pip install --extra-index-url https://pip.ccdc.cam.ac.uk/ csd-python-api
124
+ ```
125
+
126
+ After installing, ensure that you have properly installed the CCDC dataset and activated your license. For reference, setup instructions using the command line on a Linux machine can be found [here](https://support.ccdc.cam.ac.uk/support/solutions/articles/103000306299-custom-installation-of-the-csd-portfolio-software-and-data).
127
+
128
+ After running inference and setting up CCDC, you can simply run:
129
+
130
+ ```
131
+ bash run_eval.sh
132
+ ```
133
+ and the evaluation summary report will be generated in `evaluation/metric_summary.txt`. You can modify paths and file names as necessary in `run_eval.sh`.
134
+
135
+ ## πŸ™ Acknowledgements
136
+ This code heavily relies on and builds off of the [Protenix](https://github.com/bytedance/Protenix) code base, and we thank the authors of that work for their efforts.
137
+
138
+ We'd also like to thank [CCDC](https://www.ccdc.cam.ac.uk/) for their support in helping us maintain our commitment to open science.
139
+
140
+ ## πŸ“– Cite
141
+ If you make use of this code or its accompanying [paper](https://arxiv.org/abs/2512.06987), please cite this work as follows:
142
+ ```
143
+ @inproceedings{jin2025oxtal,
144
+ title={OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction},
145
+ author={Jin, Emily and Nica, Andrei Cristian and Galkin, Mikhail and Rector-Brooks, Jarrid and Lee, Kin Long Kelvin and Miret, Santiago and Arnold, Frances H and Bronstein, Michael and Bose, Avishek Joey and Tong, Alexander and Liu, Cheng-Hao},
146
+ booktitle={ICLR},
147
+ year={2026}
148
+ }
149
+ ```
150
+
151
+ ## πŸ“„ License
152
+
153
+ [![MIT License][mit-shield]][mit] [![CC BY-NC 4.0][cc-by-nc-shield]][cc-by-nc]
154
+
155
+ The **source code** for OXtal is released under an MIT License (see [LICENCE-SOURCE-CODE][mit]). However, since OXtal was trained on data from [CCDC's Cambridge Structural Database](https://www.ccdc.cam.ac.uk/), the **model weights** are released under a [Creative Commons Attribution-NonCommercial 4.0 International][cc-by-nc] (CC BY-NC 4.0) License (see [LICENCE-MODEL-WEIGHTS][weights]). For commercial use of the model weights, please ensure that you have a proper [CCDC License](https://www.ccdc.cam.ac.uk/support-and-resources/licensing-information/).
156
+
157
+ [mit]: LICENSE-SOURCE-CODE
158
+ [mit-shield]: https://img.shields.io/badge/License-MIT-yellow.svg
159
+ [weights]: LICENSE-MODEL-WEIGHTS
160
+ [cc-by-nc]: https://creativecommons.org/licenses/by-nc/4.0/
161
+ [cc-by-nc-shield]: https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg
162
+