File size: 8,293 Bytes
984b612 9540261 66c5b67 d92bb32 66c5b67 8f2c03b 19db626 d20a13c 8f2c03b 7f08f3c 19db626 8f2c03b d20a13c 1072aca d20a13c 7f08f3c d20a13c 8f2c03b 7f08f3c 19db626 8f2c03b 7f08f3c 19db626 8f2c03b ff17ec5 8f2c03b bf93bb3 768f8b4 4f8cf72 bf93bb3 768f8b4 4f8cf72 bf93bb3 9aeb07a 9540261 9aeb07a 7482747 9aeb07a 028e2ed 9aeb07a 9540261 9aeb07a 602f344 9aeb07a 9540261 f7865d8 602f344 f7865d8 545555f 8458290 9540261 f7865d8 9540261 984b612 9540261 f7865d8 9540261 d326ac6 9540261 f7865d8 d64f4fe 7b8deff 293021b f7865d8 7b8deff 602f344 f1aeeff 602f344 f7865d8 52092f1 f7865d8 984b612 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
---
license: other
---
# AIDO.StructurePrediction
[](https://github.com/genbio-ai/ModelGenerator/blob/main/LICENSE)
[](https://github.com/bytedance/Protenix/blob/main/LICENSE)
<div style="display: flex; gap: 0px; overflow-x: auto; align-items: center;">
<div style="display: flex; flex-direction: column; align-items: center;">
<span style="font-weight: bold;">Antibody</span>
<img src="https://cdn-uploads.huggingface.co/production/uploads/67f69b92e9fa2b2fdb84053d/HRkmsQGorXEpxQtauXBNe.gif" style="max-width: 400px; max-height: 400px; height: auto; width: 100%;">
</div>
<div style="display: flex; flex-direction: column; align-items: center;">
<span style="font-weight: bold;">Nanobody</span>
<img src="https://cdn-uploads.huggingface.co/production/uploads/67f69b92e9fa2b2fdb84053d/TiuDROAYDIEp4ac-4OzXB.gif" style="max-width: 400px; max-height: 400px; height: auto; width: 100%;">
</div>
<div style="display: flex; flex-direction: column; align-items: center;">
<span style="font-weight: bold;">RNA</span>
<img src="https://cdn-uploads.huggingface.co/production/uploads/67f69b92e9fa2b2fdb84053d/8gD1JOiDywkhVYLlsmngM.gif" style="max-width: 400px; max-height: 400px; height: auto; width: 100%;">
</div>
</div>
<div style="display: flex; gap: 0px; overflow-x: auto; align-items: center;">
<div style="display: flex; flex-direction: column; align-items: center;">
<span style="font-weight: bold;">Antibody-Antigen</span>
<img src="https://cdn-uploads.huggingface.co/production/uploads/67f69b92e9fa2b2fdb84053d/1n0LpyWf004Jo8KXoaYEG.gif" style="max-width: 400px; max-height: 400px; height: auto; width: 100%;">
</div>
<div style="display: flex; flex-direction: column; align-items: center;">
<span style="font-weight: bold;">Nanobody-Antigen</span>
<img src="https://cdn-uploads.huggingface.co/production/uploads/67f69b92e9fa2b2fdb84053d/rql1lOjeI-OOQketKYIvL.gif" style="max-width: 400px; max-height: 400px; height: auto; width: 100%;">
</div>
<div style="display: flex; flex-direction: column; align-items: center;">
<span style="font-weight: bold;">Protein-Ligand</span>
<img src="https://cdn-uploads.huggingface.co/production/uploads/67f69b92e9fa2b2fdb84053d/gxs-265GnvudNddPqrb_S.gif" style="max-width: 400px; max-height: 400px; height: auto; width: 100%;">
</div>
</div>
<div style="display: flex; gap: 0px; overflow-x: auto; align-items: center;">
<div style="display: flex; flex-direction: column; align-items: center;">
<span style="font-weight: bold;">Ground Truth (orange) vs Our Prediction (green)</span>
<img src="assets/figure(gt-yellow vs our-green).png" height="600" />
</div>
<div style="display: flex; flex-direction: column; align-items: center;">
<span style="font-weight: bold;">Ground Truth (orange) vs AlphaFold3 Prediction (blue)</span>
<img src="assets/figure(gt-yellow vs af3-blue).png" height="600" />
</div>
</div>
## Model Description
AIDO.StructurePrediction is an AlphaFold3-like full-atom structure prediction model,
designed to predict the structure and interactions of biological molecules,
including proteins, DNA, RNA, ligands, and antibodies. This model harnesses both structural and sequence modalities
to provide high-fidelity predictions for various biological tasks.
Our model achieved state-of-the-art performance on immunology-related structure prediction tasks,
including antibody, nanobody, antibody-antigen, and nanobody-antigen.
## Model Details
### Key Features
- **Multi-Modal Learning**: Combines 3D structural and sequence data (nucleotides and amino acids) to enhance model accuracy and applicability.
- **High-Quality Data**: We have used carefully curated structure data when training the model.
- **Data Augmentation**: Implements novel data augmentation and distillation techniques to diversify training datasets, improving robustness and generalization.
- **Integration of Multiple Sequence Alignments (MSA)**: Utilizes alignment data from diverse biological databases to improve predictive capabilities.
- **Training Strategies**: Incorporates advanced training methodologies to refine model performance and efficiency.
### Model Architecture
- **Type**: Pairformer+Diffusion model architecture.
- **Key Components**:
- **Pairformer**: Designed to learn complex relationships from both single sequences and multiple sequence alignments.
- **Diffusion Module**: Generates multiple conformations of the structure.
- **Hyperparameters**:
- Some key parameters:
| Model Arch Component | Value |
|-------------------------|:-----:|
| Pairformer Blocks | 48 |
| MSA Moduel Blocks | 4 |
| Diffusion Module Blocks | 24 |
| Diffusion Heads | 16 |
- Hyperparameters can be found in [inference_v0.1.yaml](https://github.com/genbio-ai/ModelGenerator/blob/main/experiments/AIDO.StructurePrediction/configs/inference_v0.1.yaml)
## Usage
Please see [experiments/AIDO.StructurePrediction](https://github.com/genbio-ai/ModelGenerator/tree/main/experiments/AIDO.StructurePrediction) in AIDO.ModelGenerator for more details.
## **Model Performance**
### Model Evaluation Metrics
**RMSD**: Root Mean Square Deviation between prediction and ground truth.
- **Protein/Antibody**: We calculate the RMSD for Cα atoms.
- **DNA/RNA**: We calculate the RMSD for C1 atoms.
- **Ligand**: We use the coordinates of all atoms.
When calculating RMSD for protein-ligand, RNA-ligand, and DNA-ligand interactions, if we use only Cα and C1 for
proteins, RNAs, and DNAs, while using full atom coordinates for ligands, the metric may be affected
by the number of atoms in the ligand. This could create potential issues. We plan to address this problem in the future.
**DockQ**:
We modified the script based on [this public repo](https://github.com/bjornwallner/DockQ) to support missing residues.
**Note**: For all the metrics mentioned above, if there are missing residues or atoms, we will input
the complete information into our model.
Because the ground truth structure doesn't include the coordinates of these components,
evaluating this type of data can be very challenging.
Fortunately, we know exactly which residues or atoms are missing, so we do not need to use any approximated alignment
when calculating these metrics.
We have found that using approximated alignments in metric calculations can sometimes result in
inaccurate metric values and hinder head-to-head comparisons between different methods.
### Performance
The antibody/nanobody-antigen data used for the evaluation was curated from recently released PDBs after September 30, 2021.
We also assessed the quality of the selected structures to ensure the interfaces are valid.
For instance, the binding sites are typically located in the complementarity-determining regions (CDRs) for
antibody-antigen or nanobody-antigen complexes.
Additionally, we examined the distance map between the heavy chain and light chain to confirm that
the selected chain pair constitutes a valid antibody.
<div style="display: flex;">
<div style="margin-right: 10px;">
<img src="assets/hln.png" alt="hln" width="540"/>
</div>
<div>
<img src="assets/ana.png" alt="ana" width="540"/>
</div>
</div>
## License and Disclaimer
Unless otherwise stated, this project is licensed under the GenBio AI Community License Agreement. This project includes third-party components ([MMseqs](https://github.com/soedinglab/MMseqs2), [Protenix](https://github.com/bytedance/Protenix)). Use of this project does not override or waive the original license terms of these third-party components - you are still bound by their respective licenses and can download from their original sites.
# Citation
Please cite AIDO.StructurePrediction using the following BibTex code:
```
@inproceedings{aido_structurepediction,
title = {AIDO StructurePrediction},
url = {https://huggingface.co/genbio-ai/AIDO.StructurePrediction},
author = {Kun Leo, Jiayou Zhang, Georgy Andreev, Hugo Ly, Le Song, Eric P Xing},
year = {2025},
}
``` |