nabilyasini commited on
Commit
249b16f
·
verified ·
1 Parent(s): 84766d8

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +18 -262
README.md CHANGED
@@ -1,266 +1,22 @@
1
- # BBB Permeability Prediction System
2
-
3
- A breakthrough Graph Neural Network (GNN) system for predicting Blood-Brain Barrier (BBB) permeability of chemical compounds using a hybrid GAT+GraphSAGE architecture.
4
-
5
- ## Overview
6
-
7
- This system uses state-of-the-art deep learning to predict whether molecules can cross the blood-brain barrier - a critical property for CNS drug development. The hybrid architecture combines Graph Attention Networks (GAT) for learning important molecular features and GraphSAGE for neighborhood aggregation.
8
-
9
- ## Architecture
10
-
11
- ### Hybrid GAT+SAGE Model
12
- - **Layer 1**: GAT with 8 attention heads (feature extraction)
13
- - **Layer 2**: GraphSAGE (neighborhood aggregation)
14
- - **Layer 3**: GAT with 8 attention heads (refinement)
15
- - **Pooling**: Combined mean + max global pooling
16
- - **MLP**: 4-layer prediction head with dropout
17
- - **Total Parameters**: 649,345
18
-
19
- ### Key Features
20
- - Attention mechanisms for interpretability
21
- - Batch normalization for stable training
22
- - Early stopping to prevent overfitting
23
- - Learning rate scheduling
24
- - Comprehensive evaluation metrics (MAE, RMSE, R²)
25
-
26
- ## Installation
27
-
28
- ```bash
29
- # Install dependencies
30
- pip install -r requirements.txt
31
- ```
32
-
33
- ### Requirements
34
- - PyTorch 2.9+
35
- - PyTorch Geometric 2.7+
36
- - RDKit (for molecular processing)
37
- - scikit-learn
38
- - pandas, numpy
39
- - matplotlib, seaborn
40
-
41
- ## Dataset
42
-
43
- The system includes a curated dataset of 42 compounds with known BBB permeability:
44
- - **BBB+**: 20 compounds (high permeability) - e.g., Cocaine, Caffeine, Propranolol
45
- - **BBB-**: 14 compounds (low/no permeability) - e.g., Glucose, Glutamic acid
46
- - **BBB±**: 8 compounds (moderate permeability)
47
-
48
- Permeability scores range from 0.0 (no BBB penetration) to 1.0 (high BBB penetration).
49
-
50
- ### BBB Compliance Rules
51
- For optimal BBB permeability:
52
- - Molecular Weight: 150-450 Da
53
- - LogP: 1-5
54
- - TPSA (Topological Polar Surface Area): <90 Ų
55
- - H-bond Donors: ≤3
56
- - H-bond Acceptors: ≤7
57
-
58
- ## Usage
59
-
60
- ### Web Interface (Recommended)
61
-
62
- Launch the beautiful web interface for easy predictions:
63
-
64
- ```bash
65
- # Option 1: Double-click the launcher
66
- launch_web.bat
67
-
68
- # Option 2: Command line
69
- streamlit run app.py
70
- ```
71
-
72
- The app will open at `http://localhost:8501` with:
73
- - 🎨 Beautiful interactive UI
74
- - 📊 Real-time visualizations
75
- - 🔬 20+ pre-loaded molecules
76
- - 💾 Export results (CSV/JSON)
77
- - 📈 Comprehensive analysis
78
-
79
- See [WEB_INTERFACE.md](WEB_INTERFACE.md) for detailed documentation.
80
-
81
- ### Training the Model
82
-
83
- ```bash
84
- python train_gnn.py
85
- ```
86
-
87
- This will:
88
- 1. Load and preprocess the BBB dataset
89
- 2. Train the hybrid GNN model
90
- 3. Save the best model to `models/best_model.pth`
91
- 4. Generate training visualizations
92
-
93
- Training parameters:
94
- - Epochs: 200 (with early stopping)
95
- - Learning rate: 0.001
96
- - Batch size: 4
97
- - Optimizer: Adam
98
- - Early stopping patience: 20 epochs
99
-
100
- ### Making Predictions
101
-
102
- ```python
103
- from predict_bbb import BBBGNNPredictor
104
-
105
- # Initialize predictor
106
- predictor = BBBGNNPredictor(model_path='models/best_model.pth')
107
-
108
- # Predict for a single molecule
109
- result = predictor.predict('CN1C=NC2=C1C(=O)N(C(=O)N2C)C') # Caffeine
110
-
111
- print(f"BBB Score: {result['bbb_score']:.3f}")
112
- print(f"Category: {result['category']}") # BBB+, BBB±, or BBB-
113
- print(f"LogP: {result['molecular_descriptors']['logp']:.2f}")
114
- ```
115
-
116
- ### Batch Predictions
117
-
118
- ```python
119
- smiles_list = ['CCO', 'c1ccccc1', 'CC(=O)O']
120
- results = predictor.predict_batch(smiles_list)
121
-
122
- for result in results:
123
- print(f"{result['smiles']}: {result['bbb_score']:.3f} ({result['category']})")
124
- ```
125
-
126
- ### Command-line Testing
127
-
128
- ```bash
129
- # Test with pre-defined compounds
130
- python predict_bbb.py
131
-
132
- # Test specific molecules
133
- python test_cocaine.py
134
- ```
135
-
136
- ## Project Structure
137
-
138
- ```
139
- BBB_System/
140
- ├── bbb_gnn_model.py # Hybrid GAT+SAGE architecture
141
- ├── mol_to_graph.py # SMILES to graph conversion
142
- ├── bbb_dataset.py # Dataset loader with 42 compounds
143
- ├── train_gnn.py # Training pipeline
144
- ├── predict_bbb.py # Prediction interface
145
- ├── simple_bbb.py # Baseline Random Forest model
146
- ├── test_cocaine.py # Test script for various compounds
147
- ├── requirements.txt # Dependencies
148
- ├── models/ # Trained model checkpoints
149
- │ ├── best_model.pth
150
- │ ├── training_history.png
151
- │ └── predictions.png
152
- └── README.md
153
- ```
154
-
155
- ## Model Features
156
-
157
- ### Molecular Graph Representation
158
- Each molecule is represented as a graph where:
159
- - **Nodes**: Atoms with 9 features (atomic number, degree, charge, hybridization, aromaticity, etc.)
160
- - **Edges**: Chemical bonds (bidirectional)
161
-
162
- ### Node Features (9 total)
163
- 1. Atomic number (normalized)
164
- 2. Degree (number of bonds)
165
- 3. Formal charge
166
- 4. Hybridization type
167
- 5. Aromaticity (binary)
168
- 6. In ring (binary)
169
- 7. Implicit valence
170
- 8. Explicit valence
171
- 9. Atomic mass (normalized)
172
-
173
- ## Performance
174
-
175
- The model is evaluated on:
176
- - **MAE (Mean Absolute Error)**: Average prediction error
177
- - **RMSE (Root Mean Squared Error)**: Penalizes large errors
178
- - **R² Score**: Variance explained by the model
179
-
180
- Training includes:
181
- - 80/20 train/validation split
182
- - Early stopping with 20-epoch patience
183
- - Learning rate reduction on plateau
184
- - Gradient clipping for stability
185
-
186
- ## Molecular Descriptors
187
-
188
- The system calculates traditional drug-likeness descriptors:
189
- - Molecular Weight
190
- - LogP (lipophilicity)
191
- - TPSA (Topological Polar Surface Area)
192
- - H-bond donors/acceptors
193
- - Rotatable bonds
194
- - Aromatic rings
195
- - Lipinski's Rule of 5 violations
196
-
197
- ## Example Results
198
-
199
- ```
200
- Cocaine:
201
- BBB Score: 0.892
202
- Category: BBB+ (HIGH BBB permeability)
203
- Molecular Weight: 275.3 Da
204
- LogP: 2.04
205
- TPSA: 38.8 Ų
206
- BBB Rule Compliant: True
207
-
208
- Glucose:
209
- BBB Score: 0.105
210
- Category: BBB- (LOW BBB permeability)
211
- Molecular Weight: 180.2 Da
212
- LogP: -3.24
213
- TPSA: 110.4 Ų
214
- BBB Rule Compliant: False
215
- Warning: High TPSA (>90 Ų)
216
- ```
217
-
218
- ## Baseline Comparison
219
-
220
- The system includes a baseline Random Forest model ([simple_bbb.py](simple_bbb.py)) using molecular descriptors. The GNN model learns directly from molecular structure and typically outperforms descriptor-based methods.
221
-
222
- ## Interpretability
223
-
224
- The GAT layers provide attention weights showing which molecular substructures are important for BBB permeability predictions:
225
-
226
- ```python
227
- # Extract attention weights (for analysis)
228
- attention = model.get_attention_weights(x, edge_index)
229
- ```
230
-
231
- ## Contributing
232
-
233
- Key areas for improvement:
234
- 1. Expand dataset with more diverse compounds
235
- 2. Implement external dataset loaders (e.g., BBBP from MoleculeNet)
236
- 3. Add molecular fingerprint fusion
237
- 4. Experiment with different GNN architectures (GCN, GIN, etc.)
238
- 5. Ensemble methods
239
-
240
- ## References
241
-
242
- - Graph Attention Networks (GAT): Veličković et al., ICLR 2018
243
- - GraphSAGE: Hamilton et al., NeurIPS 2017
244
- - PyTorch Geometric: Fey & Lenssen, 2019
245
- - RDKit: Open-source cheminformatics toolkit
246
-
247
- ## License
248
-
249
- This is a research/educational project for blood-brain barrier permeability prediction.
250
-
251
- ## Citation
252
 
253
- If you use this system in your research:
254
 
255
- ```bibtex
256
- @software{bbb_gnn_predictor,
257
- title = {BBB Permeability Prediction System},
258
- author = {N Yasini-Ardekani},
259
- year = {2025},
260
- description = {Hybrid GAT+SAGE GNN for Blood-Brain Barrier Permeability Prediction}
261
- }
262
- ```
263
 
264
- ---
 
265
 
266
- **Built with PyTorch Geometric** | **Powered by Deep Learning** | **For CNS Drug Discovery**
 
 
 
 
 
1
+ ---
2
+ title: StereoAwareGNN BBB Predictor
3
+ emoji: 🧠
4
+ colorFrom: green
5
+ colorTo: blue
6
+ sdk: docker
7
+ app_file: app.py
8
+ pinned: false
9
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
+ # StereoGNN-BBB: Blood-Brain Barrier Permeability Predictor
12
 
13
+ State-of-the-Art GNN model achieving AUC 0.9612 on external validation (B3DB dataset).
 
 
 
 
 
 
 
14
 
15
+ ## Author
16
+ Nabil Yasini-Ardekani
17
 
18
+ ## Features
19
+ - Stereo-aware molecular graph neural network
20
+ - Real-time BBB permeability prediction
21
+ - Molecular visualization
22
+ - Export results as JSON/CSV