File size: 8,293 Bytes
984b612
 
 
9540261
 
66c5b67
d92bb32
66c5b67
8f2c03b
 
 
 
 
19db626
d20a13c
8f2c03b
7f08f3c
19db626
8f2c03b
d20a13c
1072aca
d20a13c
 
7f08f3c
 
 
d20a13c
 
 
 
8f2c03b
7f08f3c
19db626
8f2c03b
 
7f08f3c
19db626
8f2c03b
ff17ec5
8f2c03b
 
bf93bb3
 
 
 
 
768f8b4
4f8cf72
bf93bb3
 
768f8b4
4f8cf72
bf93bb3
 
 
 
 
9aeb07a
9540261
9aeb07a
 
 
 
7482747
9aeb07a
028e2ed
9aeb07a
9540261
9aeb07a
602f344
9aeb07a
 
 
 
 
9540261
f7865d8
602f344
f7865d8
 
 
 
545555f
 
 
 
 
 
 
8458290
9540261
 
f7865d8
9540261
984b612
9540261
f7865d8
9540261
d326ac6
9540261
f7865d8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d64f4fe
7b8deff
 
 
 
 
 
293021b
 
 
 
 
 
 
 
 
f7865d8
7b8deff
602f344
 
f1aeeff
602f344
f7865d8
 
 
 
 
 
 
52092f1
f7865d8
 
984b612
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
license: other
---
# AIDO.StructurePrediction

[![License](https://img.shields.io/badge/license-GenBio_AI_Community_License-orange)](https://github.com/genbio-ai/ModelGenerator/blob/main/LICENSE)
[![License2](https://img.shields.io/badge/license-Apache_2.0-orange)](https://github.com/bytedance/Protenix/blob/main/LICENSE)



<div style="display: flex; gap: 0px; overflow-x: auto; align-items: center;">
  <div style="display: flex; flex-direction: column; align-items: center;">
    <span style="font-weight: bold;">Antibody</span>
    <img src="https://cdn-uploads.huggingface.co/production/uploads/67f69b92e9fa2b2fdb84053d/HRkmsQGorXEpxQtauXBNe.gif" style="max-width: 400px; max-height: 400px; height: auto; width: 100%;">
  </div>  
  <div style="display: flex; flex-direction: column; align-items: center;">
    <span style="font-weight: bold;">Nanobody</span>
    <img src="https://cdn-uploads.huggingface.co/production/uploads/67f69b92e9fa2b2fdb84053d/TiuDROAYDIEp4ac-4OzXB.gif" style="max-width: 400px; max-height: 400px; height: auto; width: 100%;">
  </div>
  <div style="display: flex; flex-direction: column; align-items: center;">
    <span style="font-weight: bold;">RNA</span>
    <img src="https://cdn-uploads.huggingface.co/production/uploads/67f69b92e9fa2b2fdb84053d/8gD1JOiDywkhVYLlsmngM.gif" style="max-width: 400px; max-height: 400px; height: auto; width: 100%;">
  </div>
</div>

<div style="display: flex; gap: 0px; overflow-x: auto; align-items: center;">
  <div style="display: flex; flex-direction: column; align-items: center;">
    <span style="font-weight: bold;">Antibody-Antigen</span>
    <img src="https://cdn-uploads.huggingface.co/production/uploads/67f69b92e9fa2b2fdb84053d/1n0LpyWf004Jo8KXoaYEG.gif" style="max-width: 400px; max-height: 400px; height: auto; width: 100%;">
  </div>
  <div style="display: flex; flex-direction: column; align-items: center;">
    <span style="font-weight: bold;">Nanobody-Antigen</span>
    <img src="https://cdn-uploads.huggingface.co/production/uploads/67f69b92e9fa2b2fdb84053d/rql1lOjeI-OOQketKYIvL.gif" style="max-width: 400px; max-height: 400px; height: auto; width: 100%;">
  </div>
  <div style="display: flex; flex-direction: column; align-items: center;">
    <span style="font-weight: bold;">Protein-Ligand</span>
    <img src="https://cdn-uploads.huggingface.co/production/uploads/67f69b92e9fa2b2fdb84053d/gxs-265GnvudNddPqrb_S.gif" style="max-width: 400px; max-height: 400px; height: auto; width: 100%;">
  </div>
</div>





<div style="display: flex; gap: 0px; overflow-x: auto; align-items: center;">
  <div style="display: flex; flex-direction: column; align-items: center;">
    <span style="font-weight: bold;">Ground Truth (orange) vs Our Prediction (green)</span>
    <img src="assets/figure(gt-yellow vs our-green).png" height="600" />
  </div>  
  <div style="display: flex; flex-direction: column; align-items: center;">
    <span style="font-weight: bold;">Ground Truth (orange) vs AlphaFold3 Prediction (blue)</span>
    <img src="assets/figure(gt-yellow vs af3-blue).png" height="600" />
  </div>
</div>



## Model Description

AIDO.StructurePrediction is an AlphaFold3-like full-atom structure prediction model,
designed to predict the structure and interactions of biological molecules,
including proteins, DNA, RNA, ligands, and antibodies. This model harnesses both structural and sequence modalities
to provide high-fidelity predictions for various biological tasks.
Our model achieved state-of-the-art performance on immunology-related structure prediction tasks,
including antibody, nanobody, antibody-antigen, and nanobody-antigen.

## Model Details

### Key Features

- **Multi-Modal Learning**: Combines 3D structural and sequence data (nucleotides and amino acids) to enhance model accuracy and applicability.
- **High-Quality Data**: We have used carefully curated structure data when training the model.
- **Data Augmentation**: Implements novel data augmentation and distillation techniques to diversify training datasets, improving robustness and generalization.
- **Integration of Multiple Sequence Alignments (MSA)**: Utilizes alignment data from diverse biological databases to improve predictive capabilities.
- **Training Strategies**: Incorporates advanced training methodologies to refine model performance and efficiency.

### Model Architecture  
- **Type**: Pairformer+Diffusion model architecture. 
- **Key Components**:  
  - **Pairformer**: Designed to learn complex relationships from both single sequences and multiple sequence alignments. 
  - **Diffusion Module**: Generates multiple conformations of the structure.
- **Hyperparameters**:
  - Some key parameters:
| Model Arch Component    | Value |
|-------------------------|:-----:|
| Pairformer Blocks       |  48   |
| MSA Moduel Blocks       |   4   |
| Diffusion Module Blocks |  24   |
| Diffusion Heads         |  16   |
  - Hyperparameters can be found in [inference_v0.1.yaml](https://github.com/genbio-ai/ModelGenerator/blob/main/experiments/AIDO.StructurePrediction/configs/inference_v0.1.yaml)


## Usage

Please see [experiments/AIDO.StructurePrediction](https://github.com/genbio-ai/ModelGenerator/tree/main/experiments/AIDO.StructurePrediction) in AIDO.ModelGenerator for more details.

## **Model Performance**

### Model Evaluation Metrics  

**RMSD**: Root Mean Square Deviation between prediction and ground truth.  

- **Protein/Antibody**: We calculate the RMSD for Cα atoms.  
- **DNA/RNA**: We calculate the RMSD for C1 atoms.  
- **Ligand**: We use the coordinates of all atoms.  

When calculating RMSD for protein-ligand, RNA-ligand, and DNA-ligand interactions, if we use only Cα and C1 for 
proteins, RNAs, and DNAs, while using full atom coordinates for ligands, the metric may be affected 
by the number of atoms in the ligand. This could create potential issues. We plan to address this problem in the future.  

**DockQ**:  
We modified the script based on [this public repo](https://github.com/bjornwallner/DockQ) to support missing residues.  

**Note**: For all the metrics mentioned above, if there are missing residues or atoms, we will input 
the complete information into our model. 
Because the ground truth structure doesn't include the coordinates of these components, 
evaluating this type of data can be very challenging. 
Fortunately, we know exactly which residues or atoms are missing, so we do not need to use any approximated alignment 
when calculating these metrics. 
We have found that using approximated alignments in metric calculations can sometimes result in 
inaccurate metric values and hinder head-to-head comparisons between different methods.  

### Performance

The antibody/nanobody-antigen data used for the evaluation was curated from recently released PDBs after September 30, 2021. 
We also assessed the quality of the selected structures to ensure the interfaces are valid. 
For instance, the binding sites are typically located in the complementarity-determining regions (CDRs) for 
antibody-antigen or nanobody-antigen complexes. 
Additionally, we examined the distance map between the heavy chain and light chain to confirm that 
the selected chain pair constitutes a valid antibody.


<div style="display: flex;">  
    <div style="margin-right: 10px;">  
        <img src="assets/hln.png" alt="hln" width="540"/>  
    </div>  
    <div>  
        <img src="assets/ana.png" alt="ana" width="540"/>  
    </div>  
</div> 


## License and Disclaimer  

Unless otherwise stated, this project is licensed under the GenBio AI Community License Agreement. This project includes third-party components ([MMseqs](https://github.com/soedinglab/MMseqs2), [Protenix](https://github.com/bytedance/Protenix)). Use of this project does not override or waive the original license terms of these third-party components - you are still bound by their respective licenses and can download from their original sites.

# Citation

Please cite AIDO.StructurePrediction using the following BibTex code:
```
@inproceedings{aido_structurepediction,
	title = {AIDO StructurePrediction},
	url = {https://huggingface.co/genbio-ai/AIDO.StructurePrediction},
	author = {Kun Leo, Jiayou Zhang, Georgy Andreev, Hugo Ly, Le Song, Eric P Xing},
	year = {2025},
}
```