Alex Sychov commited on
Commit
ea9db3b
·
unverified ·
1 Parent(s): cd615e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -3
README.md CHANGED
@@ -9,7 +9,7 @@ license: mit
9
  ---
10
 
11
  # 💊 Drug Binding Affinity Prediction with GNNs + CNN + Cross-Attention & LLM Interpretation
12
- <img width="1284" height="582" alt="image" src="https://github.com/user-attachments/assets/cb814000-5ae2-4967-8b9f-f25869dd5d53" />
13
 
14
  This project is the implementation of the Deep Learning model to predict the **Binding Affinity ($pK_d$)** between drug candidates (ligand) and target proteins. The feature of that system is that it solves the "Black Box" problem in drug discovery field by presenting an **Explainable AI (XAI)** module powered by **Cross-Attention weights** and **LLM interpretation**, which allows researchers to visualize the active site of the ligand and which atoms play a vital role in the binding process.
15
 
@@ -17,7 +17,7 @@ This project is the implementation of the Deep Learning model to predict the **B
17
  ## Architecture: The "Hybrid" Approach
18
  The model uses a dual-encoder architecture with a Cross-Attention mechanism, mimicking the physical binding process:
19
 
20
- <img width="3756" height="1797" alt="binding_affinity drawio" src="https://github.com/user-attachments/assets/1e510205-c9c2-468d-8372-2a8a0b45aae7" />
21
 
22
  1. **Ligand Encoder (Graph):**
23
  * **GAT (Graph Attention Network):** Treats atoms as nodes and bonds as edges. Uses 4 attention heads to capture complex chemical substructures.
@@ -31,7 +31,7 @@ We compared multiple architectures on the **PDBbind Refined** dataset. The Hybri
31
  |--------------|---------------|---------------|---------------|
32
  | GCN + Transformer for proteins | 1.5190 | 1.1957 | 0.6285 |
33
  | GAT + Transformer for proteins | 1.5117 | 1.2074 | 0.6310 |
34
- | GAT + 1 CNN for proteins + Cross-Attention | 1.3867 | 1.0947 | 0.7013 |
35
 
36
 
37
  ## Explainability (XAI)
@@ -41,5 +41,49 @@ The key moment is that the model does not give only a number, but an asnwer why
41
  3. Check the drug likeliness of the ligand according to the Lipinski's Rule of 5.
42
  4. Uses **Google Gemini API** to generate a chemical explanation of *why* these atoms are critical (e.g., hydrogen bonds, hydrophobic interactions).
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  ---
45
  *Created by Alex Sychov*
 
9
  ---
10
 
11
  # 💊 Drug Binding Affinity Prediction with GNNs + CNN + Cross-Attention & LLM Interpretation
12
+
13
 
14
  This project is the implementation of the Deep Learning model to predict the **Binding Affinity ($pK_d$)** between drug candidates (ligand) and target proteins. The feature of that system is that it solves the "Black Box" problem in drug discovery field by presenting an **Explainable AI (XAI)** module powered by **Cross-Attention weights** and **LLM interpretation**, which allows researchers to visualize the active site of the ligand and which atoms play a vital role in the binding process.
15
 
 
17
  ## Architecture: The "Hybrid" Approach
18
  The model uses a dual-encoder architecture with a Cross-Attention mechanism, mimicking the physical binding process:
19
 
20
+ <img width="100%" alt="binding_affinity drawio" src="https://github.com/user-attachments/assets/1e510205-c9c2-468d-8372-2a8a0b45aae7" />
21
 
22
  1. **Ligand Encoder (Graph):**
23
  * **GAT (Graph Attention Network):** Treats atoms as nodes and bonds as edges. Uses 4 attention heads to capture complex chemical substructures.
 
31
  |--------------|---------------|---------------|---------------|
32
  | GCN + Transformer for proteins | 1.5190 | 1.1957 | 0.6285 |
33
  | GAT + Transformer for proteins | 1.5117 | 1.2074 | 0.6310 |
34
+ | **GAT + Deep CNN + Cross-Attention** | **1.3867** | **1.0947** | **0.7013** |
35
 
36
 
37
  ## Explainability (XAI)
 
41
  3. Check the drug likeliness of the ligand according to the Lipinski's Rule of 5.
42
  4. Uses **Google Gemini API** to generate a chemical explanation of *why* these atoms are critical (e.g., hydrogen bonds, hydrophobic interactions).
43
 
44
+ ## 🧪 Case Study: HIV-1 Protease Inhibitor (PDB: 6e9a)
45
+ To validate the model on high-complexity ligands, we tested it on a potent HIV-1 protease inhibitor (Darunavir analog, PDB: 6e9a).
46
+ * **Ligand: Sulfonamide-based inhibitor ($C_{29}H_{37}N_3O_7S$, MW 575.7 Da). SMILES:** `COc1ccc(S(=O)(=O)N(CC(C)C)C[C@@H](O)[C@H](Cc2ccccc2)NC(=O)O[C@@H]2C[C@@H]3NC(=O)O[C@@H]3C2)cc1`
47
+ * **Target:** HIV-1 Protease Chain A.
48
+
49
+ * **Molecule:** Sulfonamide-based inhibitor ($C_{29}H_{37}N_3O_7S$, MW 575.7 Da).
50
+ * **Predicted Affinity ($pK_d$):** `7.22` (Classified as **Strong Binder**)
51
+ * **Real Affinity:** High potency confirmed by PDB data.
52
+
53
+ The Cross-Attention mechanism identified the key pharmacophore features without prior knowledge:
54
+ 1. **Polar Anchors (Oxygen #16, #34):** The model assigned high attention scores to the oxygen atoms. Chemically, these act as hydrogen bond acceptors, critical for anchoring the drug to the protein's backbone (Asp29/Asp30 residues).
55
+ 2. **Hydrophobic Core (Carbon #0):** The model highlighted the aromatic carbon in the terminal ring, which is essential for hydrophobic packing in the S2' pocket of the protease.
56
+ <img width="800" alt="Molecule Visualization" src="https://github.com/user-attachments/assets/245be900-41ff-44e9-b31a-69be6d42be8e" />
57
+
58
+ ### 3. Top Critical Atoms
59
+ Below are the atoms with the highest attention weights contributed to the decision:
60
+
61
+ | Rank | Atom Index | Type | Attention Score | Interpretation |
62
+ | :--- | :--- | :--- | :--- | :--- |
63
+ | 1 | #57 | H | 1.000 | Hydrogen Bond Donor |
64
+ | 2 | #16 | O | 0.729 | **Sulfonamide Oxygen (Key Anchor)** |
65
+ | 3 | #0 | C | 0.676 | **Aromatic Ring (Hydrophobic)** |
66
+ | 4 | #34 | O | 0.581 | Ether Oxygen (H-bond Acceptor) |
67
+ | 5 | #22, #23 | C | ~0.600 | Hydrophobic Scaffold |
68
+
69
+ ### 4. Drug-Likeness & Gemini Report
70
+ The system automatically generates a report to assist chemists:
71
+
72
+ #### 💊 Lipinski's Rule of 5 Analysis
73
+ * **Status:** Poor (2 violations) 🔴
74
+ * **Mass:** 575.68 Da (Violation: > 500)
75
+ * **H-Acceptors:** 11 (Violation: > 10)
76
+ * *Note: HIV protease inhibitors are often large molecules that break these rules but remain effective.*
77
+
78
+ #### 🤖 Google Gemini Analysis
79
+ >Affinity Analysis: The predicted binding affinity (pKd = 7.22) suggests moderate to strong binding for this ligand to the target protein. A pKd > 7 generally indicates a promising starting point for drug >discovery, implying significant interaction.
80
+ >
81
+ >Structural Basis: The highlighted atoms, particularly Oxygen (idx 16) and Nitrogen (implicit in the sulfonamide and carbamate groups), likely participate in hydrogen bonding as donors or acceptors. Aromatic >carbons (e.g., idx 0 for the methoxy-substituted phenyl ring) are key for pi-pi stacking interactions within the protein's binding pocket. The specific arrangement of these functional groups and their positions >relative to the protein are critical for recognition.
82
+ >
83
+ >Drug-Likeness: With 2 Lipinski violations, this molecule exhibits poor drug-likeness, particularly concerning oral bioavailability. It may face challenges with absorption and membrane permeability.
84
+ >
85
+ >Conclusion: While the binding affinity is encouraging, the poor drug-likeness warrants caution. Further structural optimization to improve Lipinski compliance would be essential before proceeding with this >molecule as a drug candidate.
86
+
87
+
88
  ---
89
  *Created by Alex Sychov*