Shoriful025
/

molecular_bioactivity_predictor_gnn

molecular-property-prediction

Model card Files Files and versions

molecular_bioactivity_predictor_gnn / README.md

Shoriful025's picture

Create README.md

9fdf3a8 verified about 2 months ago

|

history blame contribute delete

1.85 kB

	---
	license: mit
	tags:
	- biology
	- chemistry
	- molecular-property-prediction
	- gnn
	- drug-discovery
	---

	# molecular_bioactivity_predictor_gnn

	## Overview
	This model utilizes a Graph Isomorphism Network (GIN) to predict the bioactivity and binding affinity ($K_i$) of small molecules against specific protein targets. By representing molecules as graphs where atoms are nodes and bonds are edges, the model captures complex spatial relationships crucial for pharmacological efficacy.



	## Model Architecture
	The model implements a Message Passing Neural Network (MPNN) using the GIN convolution operator.
	- Node Features: Includes atomic number, chirality, hybridization, and formal charge.
	- Edge Features: Includes bond type (single, double, triple, aromatic) and stereochemistry.
	- Readout Layer: Global Mean Pooling followed by a 3-layer MLP.
	- Aggregation: The update rule for node $i$ at layer $k$ is defined as:
	$$h_i^{(k)} = \text{MLP}^{(k)} \left( (1 + \epsilon^{(k)}) \cdot h_i^{(k-1)} + \sum_{j \in \mathcal{N}(i)} h_j^{(k-1)} \right)$$

	## Intended Use
	- Virtual Screening: Ranking massive libraries of compounds to identify potential lead candidates for synthesis.
	- ADMET Prediction: Estimating the solubility and lipophilicity of new chemical entities.
	- Target Profiling: Predicting potential off-target interactions to minimize clinical side effects.

	## Limitations
	- Stereoisomers: The model may struggle to differentiate between complex enantiomers that have identical connectivity but different biological activity.
	- Large Molecules: It is primarily validated on small molecules (MW < 800 Da) and may not generalize to biologics or large macrocycles.
	- Dataset Bias: Prediction accuracy is highly dependent on the chemical diversity of the training set (e.g., ChEMBL or PDBBind).