Shoriful025 commited on
Commit
9fdf3a8
·
verified ·
1 Parent(s): f812f92

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - biology
5
+ - chemistry
6
+ - molecular-property-prediction
7
+ - gnn
8
+ - drug-discovery
9
+ ---
10
+
11
+ # molecular_bioactivity_predictor_gnn
12
+
13
+ ## Overview
14
+ This model utilizes a Graph Isomorphism Network (GIN) to predict the bioactivity and binding affinity ($K_i$) of small molecules against specific protein targets. By representing molecules as graphs where atoms are nodes and bonds are edges, the model captures complex spatial relationships crucial for pharmacological efficacy.
15
+
16
+
17
+
18
+ ## Model Architecture
19
+ The model implements a **Message Passing Neural Network (MPNN)** using the GIN convolution operator.
20
+ - **Node Features**: Includes atomic number, chirality, hybridization, and formal charge.
21
+ - **Edge Features**: Includes bond type (single, double, triple, aromatic) and stereochemistry.
22
+ - **Readout Layer**: Global Mean Pooling followed by a 3-layer MLP.
23
+ - **Aggregation**: The update rule for node $i$ at layer $k$ is defined as:
24
+ $$h_i^{(k)} = \text{MLP}^{(k)} \left( (1 + \epsilon^{(k)}) \cdot h_i^{(k-1)} + \sum_{j \in \mathcal{N}(i)} h_j^{(k-1)} \right)$$
25
+
26
+ ## Intended Use
27
+ - **Virtual Screening**: Ranking massive libraries of compounds to identify potential lead candidates for synthesis.
28
+ - **ADMET Prediction**: Estimating the solubility and lipophilicity of new chemical entities.
29
+ - **Target Profiling**: Predicting potential off-target interactions to minimize clinical side effects.
30
+
31
+ ## Limitations
32
+ - **Stereoisomers**: The model may struggle to differentiate between complex enantiomers that have identical connectivity but different biological activity.
33
+ - **Large Molecules**: It is primarily validated on small molecules (MW < 800 Da) and may not generalize to biologics or large macrocycles.
34
+ - **Dataset Bias**: Prediction accuracy is highly dependent on the chemical diversity of the training set (e.g., ChEMBL or PDBBind).