Alex Sychov
Update README.md
4a03f55 unverified
metadata
title: Binding Affinity Prediction
emoji: 🧬
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
license: mit

πŸ’Š Drug Binding Affinity Prediction with GNNs + CNN + Cross-Attention & LLM Interpretation

image

This project is the implementation of the Deep Learning model to predict the Binding Affinity ($pK_d$) between drug candidates (ligand) and target proteins. The feature of that system is that it solves the "Black Box" problem in drug discovery field by presenting an Explainable AI (XAI) module powered by Cross-Attention weights and LLM interpretation, which allows researchers to visualize the active site of the ligand and which atoms play a vital role in the binding process.

Architecture: The "Hybrid" Approach

The model uses a dual-encoder architecture with a Cross-Attention mechanism, mimicking the physical binding process:

binding_affinity drawio
  1. Ligand Encoder (Graph):
    • GAT (Graph Attention Network): Treats atoms as nodes and bonds as edges. Uses 4 attention heads to capture complex chemical substructures.
  2. Protein Encoder (Sequence):
    • CNN 1D Convolution to capture local protein structures. Considering the small size of the PDbind refined set, it outperformes complex architecture (Transformer).
  3. Cross-Attention Layer Core feature of the project gives an understanding about the chemical bond between the ligands and the proteins, allows atoms of the ligand to look at the protein sequence, and 'bind' to specific regions of the protein sequence. It gives a chance to understand the relationship between different atoms of the ligand with different acids of the protein, specifically which atom interacts most with which amino acid.

Results

We compared multiple architectures on the PDBbind Refined dataset. The Hybrid GAT+CNN with the Cross-Attention mechanism model achieved State-of-the-Art (SOTA) level performance for this scope. In conclusion, the CNN & Cross-Attention based model outperforms the Transformer based one.

Model Architecture RMSE MAE Pearson Correlation
GCN + Transformer for proteins 1.5190 1.1957 0.6285
GAT + Transformer for proteins 1.5117 1.2074 0.6310
GAT + Deep CNN + Cross-Attention 1.3867 1.0947 0.7013

Explainability (XAI)

The key moment is that the model does not give only a number, but an asnwer why it predicted that specific number.

  1. Extracts attention weights from the Cross-Attention layer.
  2. Identifies the Top-15 atoms responsible for binding process. (Got the atom number in the SMILES ligand sequence, name of atom, and the imprortance of that atom in the binding process (0 - min, 1 - max))
  3. Check the drug likeliness of the ligand according to the Lipinski's Rule of 5.
  4. Uses Google Gemini API to generate a chemical explanation of why these atoms are critical (e.g., hydrogen bonds, hydrophobic interactions).

πŸ§ͺ Case Study: HIV-1 Protease Inhibitor (PDB: 6e9a)

To validate the model on high-complexity ligands, we tested it on a potent HIV-1 protease inhibitor (Darunavir analog, PDB: 6e9a).

  • Ligand: Sulfonamide-based inhibitor ($C_{29}H_{37}N_3O_7S$, MW 575.7 Da). SMILES: COc1ccc(S(=O)(=O)N(CC(C)C)C[C@@H](O)[C@H](Cc2ccccc2)NC(=O)O[C@@H]2C[C@@H]3NC(=O)O[C@@H]3C2)cc1

  • Target: HIV-1 Protease Chain A.

  • Molecule: Sulfonamide-based inhibitor ($C_{29}H_{37}N_3O_7S$, MW 575.7 Da).

  • Predicted Affinity ($pK_d$): 7.22 (Classified as Strong Binder)

  • Real Affinity: High potency confirmed by PDB data.

The Cross-Attention mechanism identified the key pharmacophore features without prior knowledge:

  1. Polar Anchors (Oxygen #16, #34): The model assigned high attention scores to the oxygen atoms. Chemically, these act as hydrogen bond acceptors, critical for anchoring the drug to the protein's backbone (Asp29/Asp30 residues).
  2. Hydrophobic Core (Carbon #0): The model highlighted the aromatic carbon in the terminal ring, which is essential for hydrophobic packing in the S2' pocket of the protease.Molecule Visualization

3. Top Critical Atoms

Below are the atoms with the highest attention weights contributed to the decision:

Rank Atom Index Type Attention Score Interpretation
1 #57 H 1.000 Hydrogen Bond Donor
2 #16 O 0.729 Sulfonamide Oxygen (Key Anchor)
3 #0 C 0.676 Aromatic Ring (Hydrophobic)
4 #34 O 0.581 Ether Oxygen (H-bond Acceptor)
5 #22, #23 C ~0.600 Hydrophobic Scaffold

4. Drug-Likeness & Gemini Report

The system automatically generates a report to assist chemists:

πŸ’Š Lipinski's Rule of 5 Analysis

  • Status: Poor (2 violations) πŸ”΄
  • Mass: 575.68 Da (Violation: > 500)
  • H-Acceptors: 11 (Violation: > 10)
  • Note: HIV protease inhibitors are often large molecules that break these rules but remain effective.

πŸ€– Google Gemini Analysis

Affinity Analysis: The predicted binding affinity (pKd = 7.22) suggests moderate to strong binding for this ligand to the target protein. A pKd > 7 generally indicates a promising starting point for drug >discovery, implying significant interaction.

Structural Basis: The highlighted atoms, particularly Oxygen (idx 16) and Nitrogen (implicit in the sulfonamide and carbamate groups), likely participate in hydrogen bonding as donors or acceptors. Aromatic >carbons (e.g., idx 0 for the methoxy-substituted phenyl ring) are key for pi-pi stacking interactions within the protein's binding pocket. The specific arrangement of these functional groups and their positions >relative to the protein are critical for recognition.

Drug-Likeness: With 2 Lipinski violations, this molecule exhibits poor drug-likeness, particularly concerning oral bioavailability. It may face challenges with absorption and membrane permeability.

Conclusion: While the binding affinity is encouraging, the poor drug-likeness warrants caution. Further structural optimization to improve Lipinski compliance would be essential before proceeding with this >molecule as a drug candidate.


Created by Alex Sychov