Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- biology
|
| 5 |
+
- chemistry
|
| 6 |
+
- molecular-property-prediction
|
| 7 |
+
- gnn
|
| 8 |
+
- drug-discovery
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# molecular_bioactivity_predictor_gnn
|
| 12 |
+
|
| 13 |
+
## Overview
|
| 14 |
+
This model utilizes a Graph Isomorphism Network (GIN) to predict the bioactivity and binding affinity ($K_i$) of small molecules against specific protein targets. By representing molecules as graphs where atoms are nodes and bonds are edges, the model captures complex spatial relationships crucial for pharmacological efficacy.
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
## Model Architecture
|
| 19 |
+
The model implements a **Message Passing Neural Network (MPNN)** using the GIN convolution operator.
|
| 20 |
+
- **Node Features**: Includes atomic number, chirality, hybridization, and formal charge.
|
| 21 |
+
- **Edge Features**: Includes bond type (single, double, triple, aromatic) and stereochemistry.
|
| 22 |
+
- **Readout Layer**: Global Mean Pooling followed by a 3-layer MLP.
|
| 23 |
+
- **Aggregation**: The update rule for node $i$ at layer $k$ is defined as:
|
| 24 |
+
$$h_i^{(k)} = \text{MLP}^{(k)} \left( (1 + \epsilon^{(k)}) \cdot h_i^{(k-1)} + \sum_{j \in \mathcal{N}(i)} h_j^{(k-1)} \right)$$
|
| 25 |
+
|
| 26 |
+
## Intended Use
|
| 27 |
+
- **Virtual Screening**: Ranking massive libraries of compounds to identify potential lead candidates for synthesis.
|
| 28 |
+
- **ADMET Prediction**: Estimating the solubility and lipophilicity of new chemical entities.
|
| 29 |
+
- **Target Profiling**: Predicting potential off-target interactions to minimize clinical side effects.
|
| 30 |
+
|
| 31 |
+
## Limitations
|
| 32 |
+
- **Stereoisomers**: The model may struggle to differentiate between complex enantiomers that have identical connectivity but different biological activity.
|
| 33 |
+
- **Large Molecules**: It is primarily validated on small molecules (MW < 800 Da) and may not generalize to biologics or large macrocycles.
|
| 34 |
+
- **Dataset Bias**: Prediction accuracy is highly dependent on the chemical diversity of the training set (e.g., ChEMBL or PDBBind).
|