Alex Sychov commited on
Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ license: mit
|
|
| 9 |
---
|
| 10 |
|
| 11 |
# 💊 Drug Binding Affinity Prediction with GNNs + CNN + Cross-Attention & LLM Interpretation
|
| 12 |
-
|
| 13 |
|
| 14 |
This project is the implementation of the Deep Learning model to predict the **Binding Affinity ($pK_d$)** between drug candidates (ligand) and target proteins. The feature of that system is that it solves the "Black Box" problem in drug discovery field by presenting an **Explainable AI (XAI)** module powered by **Cross-Attention weights** and **LLM interpretation**, which allows researchers to visualize the active site of the ligand and which atoms play a vital role in the binding process.
|
| 15 |
|
|
@@ -17,7 +17,7 @@ This project is the implementation of the Deep Learning model to predict the **B
|
|
| 17 |
## Architecture: The "Hybrid" Approach
|
| 18 |
The model uses a dual-encoder architecture with a Cross-Attention mechanism, mimicking the physical binding process:
|
| 19 |
|
| 20 |
-
<img width="
|
| 21 |
|
| 22 |
1. **Ligand Encoder (Graph):**
|
| 23 |
* **GAT (Graph Attention Network):** Treats atoms as nodes and bonds as edges. Uses 4 attention heads to capture complex chemical substructures.
|
|
@@ -31,7 +31,7 @@ We compared multiple architectures on the **PDBbind Refined** dataset. The Hybri
|
|
| 31 |
|--------------|---------------|---------------|---------------|
|
| 32 |
| GCN + Transformer for proteins | 1.5190 | 1.1957 | 0.6285 |
|
| 33 |
| GAT + Transformer for proteins | 1.5117 | 1.2074 | 0.6310 |
|
| 34 |
-
| GAT +
|
| 35 |
|
| 36 |
|
| 37 |
## Explainability (XAI)
|
|
@@ -41,5 +41,49 @@ The key moment is that the model does not give only a number, but an asnwer why
|
|
| 41 |
3. Check the drug likeliness of the ligand according to the Lipinski's Rule of 5.
|
| 42 |
4. Uses **Google Gemini API** to generate a chemical explanation of *why* these atoms are critical (e.g., hydrogen bonds, hydrophobic interactions).
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
---
|
| 45 |
*Created by Alex Sychov*
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
# 💊 Drug Binding Affinity Prediction with GNNs + CNN + Cross-Attention & LLM Interpretation
|
| 12 |
+
|
| 13 |
|
| 14 |
This project is the implementation of the Deep Learning model to predict the **Binding Affinity ($pK_d$)** between drug candidates (ligand) and target proteins. The feature of that system is that it solves the "Black Box" problem in drug discovery field by presenting an **Explainable AI (XAI)** module powered by **Cross-Attention weights** and **LLM interpretation**, which allows researchers to visualize the active site of the ligand and which atoms play a vital role in the binding process.
|
| 15 |
|
|
|
|
| 17 |
## Architecture: The "Hybrid" Approach
|
| 18 |
The model uses a dual-encoder architecture with a Cross-Attention mechanism, mimicking the physical binding process:
|
| 19 |
|
| 20 |
+
<img width="100%" alt="binding_affinity drawio" src="https://github.com/user-attachments/assets/1e510205-c9c2-468d-8372-2a8a0b45aae7" />
|
| 21 |
|
| 22 |
1. **Ligand Encoder (Graph):**
|
| 23 |
* **GAT (Graph Attention Network):** Treats atoms as nodes and bonds as edges. Uses 4 attention heads to capture complex chemical substructures.
|
|
|
|
| 31 |
|--------------|---------------|---------------|---------------|
|
| 32 |
| GCN + Transformer for proteins | 1.5190 | 1.1957 | 0.6285 |
|
| 33 |
| GAT + Transformer for proteins | 1.5117 | 1.2074 | 0.6310 |
|
| 34 |
+
| **GAT + Deep CNN + Cross-Attention** | **1.3867** | **1.0947** | **0.7013** |
|
| 35 |
|
| 36 |
|
| 37 |
## Explainability (XAI)
|
|
|
|
| 41 |
3. Check the drug likeliness of the ligand according to the Lipinski's Rule of 5.
|
| 42 |
4. Uses **Google Gemini API** to generate a chemical explanation of *why* these atoms are critical (e.g., hydrogen bonds, hydrophobic interactions).
|
| 43 |
|
| 44 |
+
## 🧪 Case Study: HIV-1 Protease Inhibitor (PDB: 6e9a)
|
| 45 |
+
To validate the model on high-complexity ligands, we tested it on a potent HIV-1 protease inhibitor (Darunavir analog, PDB: 6e9a).
|
| 46 |
+
* **Ligand: Sulfonamide-based inhibitor ($C_{29}H_{37}N_3O_7S$, MW 575.7 Da). SMILES:** `COc1ccc(S(=O)(=O)N(CC(C)C)C[C@@H](O)[C@H](Cc2ccccc2)NC(=O)O[C@@H]2C[C@@H]3NC(=O)O[C@@H]3C2)cc1`
|
| 47 |
+
* **Target:** HIV-1 Protease Chain A.
|
| 48 |
+
|
| 49 |
+
* **Molecule:** Sulfonamide-based inhibitor ($C_{29}H_{37}N_3O_7S$, MW 575.7 Da).
|
| 50 |
+
* **Predicted Affinity ($pK_d$):** `7.22` (Classified as **Strong Binder**)
|
| 51 |
+
* **Real Affinity:** High potency confirmed by PDB data.
|
| 52 |
+
|
| 53 |
+
The Cross-Attention mechanism identified the key pharmacophore features without prior knowledge:
|
| 54 |
+
1. **Polar Anchors (Oxygen #16, #34):** The model assigned high attention scores to the oxygen atoms. Chemically, these act as hydrogen bond acceptors, critical for anchoring the drug to the protein's backbone (Asp29/Asp30 residues).
|
| 55 |
+
2. **Hydrophobic Core (Carbon #0):** The model highlighted the aromatic carbon in the terminal ring, which is essential for hydrophobic packing in the S2' pocket of the protease.
|
| 56 |
+
<img width="800" alt="Molecule Visualization" src="https://github.com/user-attachments/assets/245be900-41ff-44e9-b31a-69be6d42be8e" />
|
| 57 |
+
|
| 58 |
+
### 3. Top Critical Atoms
|
| 59 |
+
Below are the atoms with the highest attention weights contributed to the decision:
|
| 60 |
+
|
| 61 |
+
| Rank | Atom Index | Type | Attention Score | Interpretation |
|
| 62 |
+
| :--- | :--- | :--- | :--- | :--- |
|
| 63 |
+
| 1 | #57 | H | 1.000 | Hydrogen Bond Donor |
|
| 64 |
+
| 2 | #16 | O | 0.729 | **Sulfonamide Oxygen (Key Anchor)** |
|
| 65 |
+
| 3 | #0 | C | 0.676 | **Aromatic Ring (Hydrophobic)** |
|
| 66 |
+
| 4 | #34 | O | 0.581 | Ether Oxygen (H-bond Acceptor) |
|
| 67 |
+
| 5 | #22, #23 | C | ~0.600 | Hydrophobic Scaffold |
|
| 68 |
+
|
| 69 |
+
### 4. Drug-Likeness & Gemini Report
|
| 70 |
+
The system automatically generates a report to assist chemists:
|
| 71 |
+
|
| 72 |
+
#### 💊 Lipinski's Rule of 5 Analysis
|
| 73 |
+
* **Status:** Poor (2 violations) 🔴
|
| 74 |
+
* **Mass:** 575.68 Da (Violation: > 500)
|
| 75 |
+
* **H-Acceptors:** 11 (Violation: > 10)
|
| 76 |
+
* *Note: HIV protease inhibitors are often large molecules that break these rules but remain effective.*
|
| 77 |
+
|
| 78 |
+
#### 🤖 Google Gemini Analysis
|
| 79 |
+
>Affinity Analysis: The predicted binding affinity (pKd = 7.22) suggests moderate to strong binding for this ligand to the target protein. A pKd > 7 generally indicates a promising starting point for drug >discovery, implying significant interaction.
|
| 80 |
+
>
|
| 81 |
+
>Structural Basis: The highlighted atoms, particularly Oxygen (idx 16) and Nitrogen (implicit in the sulfonamide and carbamate groups), likely participate in hydrogen bonding as donors or acceptors. Aromatic >carbons (e.g., idx 0 for the methoxy-substituted phenyl ring) are key for pi-pi stacking interactions within the protein's binding pocket. The specific arrangement of these functional groups and their positions >relative to the protein are critical for recognition.
|
| 82 |
+
>
|
| 83 |
+
>Drug-Likeness: With 2 Lipinski violations, this molecule exhibits poor drug-likeness, particularly concerning oral bioavailability. It may face challenges with absorption and membrane permeability.
|
| 84 |
+
>
|
| 85 |
+
>Conclusion: While the binding affinity is encouraging, the poor drug-likeness warrants caution. Further structural optimization to improve Lipinski compliance would be essential before proceeding with this >molecule as a drug candidate.
|
| 86 |
+
|
| 87 |
+
|
| 88 |
---
|
| 89 |
*Created by Alex Sychov*
|