Spaces:
Running
Running
Upload 4 files
Browse files- README.md +165 -13
- app.py +115 -0
- packages.txt +3 -0
- requirements.txt +10 -0
README.md
CHANGED
|
@@ -1,13 +1,165 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# DiffDock API Layer for Window 8 Drug Development
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
This directory contains the optimized DiffDock molecular docking engine designed to run on Hugging Face's **free CPU Basic tier** (2 vCPUs). It provides protein-ligand binding affinity predictions for drug development analysis.
|
| 5 |
+
|
| 6 |
+
## Architecture
|
| 7 |
+
- **Platform**: Hugging Face Spaces (Gradio SDK)
|
| 8 |
+
- **Hardware**: CPU Basic (Free Tier - 2 vCPUs)
|
| 9 |
+
- **Framework**: DiffDock neural network for molecular docking
|
| 10 |
+
- **API**: RESTful endpoint for Cloudflare Worker integration
|
| 11 |
+
|
| 12 |
+
## Files
|
| 13 |
+
|
| 14 |
+
### 1. `packages.txt`
|
| 15 |
+
System-level dependencies installed before Python setup:
|
| 16 |
+
- `unzip` - Archive extraction
|
| 17 |
+
- `wget` - File downloads
|
| 18 |
+
- `libgl1-mesa-glx` - OpenGL support for molecular visualization
|
| 19 |
+
|
| 20 |
+
### 2. `requirements.txt`
|
| 21 |
+
Python dependencies optimized for CPU execution:
|
| 22 |
+
- **PyTorch 2.2.1** (CPU-only build)
|
| 23 |
+
- **torch-geometric 2.5.2** - Graph neural networks
|
| 24 |
+
- **biopython 1.83** - Biological computation
|
| 25 |
+
- **rdkit 2023.9.5** - Chemical informatics
|
| 26 |
+
- **gradio 4.19.2** - Web interface and API
|
| 27 |
+
- **pandas 2.2.1** - Data manipulation
|
| 28 |
+
- **pyyaml 6.0.1** - Configuration parsing
|
| 29 |
+
- **scipy 1.12.0** - Scientific computing
|
| 30 |
+
- **networkx 3.2.1** - Graph algorithms
|
| 31 |
+
|
| 32 |
+
### 3. `app.py`
|
| 33 |
+
Main application with three key components:
|
| 34 |
+
|
| 35 |
+
#### CPU Optimization
|
| 36 |
+
```python
|
| 37 |
+
torch.set_num_threads(2)
|
| 38 |
+
os.environ["OMP_NUM_THREADS"] = "2"
|
| 39 |
+
os.environ["MKL_NUM_THREADS"] = "2"
|
| 40 |
+
```
|
| 41 |
+
Limits thread usage to match free tier allocation.
|
| 42 |
+
|
| 43 |
+
#### Automated Setup
|
| 44 |
+
- Clones DiffDock repository
|
| 45 |
+
- Downloads pre-trained weights from Zenodo
|
| 46 |
+
- Configures inference pipeline
|
| 47 |
+
|
| 48 |
+
#### API Endpoint
|
| 49 |
+
- **Function**: `run_diffdock_inference(protein_pdb_content, ligand_smiles_string)`
|
| 50 |
+
- **Input**:
|
| 51 |
+
- Protein structure (PDB format)
|
| 52 |
+
- Ligand molecule (SMILES string)
|
| 53 |
+
- **Output**: JSON with confidence score
|
| 54 |
+
- **API Name**: `execute_diffdock_prediction`
|
| 55 |
+
|
| 56 |
+
## Deployment Steps
|
| 57 |
+
|
| 58 |
+
### 1. Create Hugging Face Space
|
| 59 |
+
1. Go to https://huggingface.co/spaces
|
| 60 |
+
2. Click **"Create a New Space"**
|
| 61 |
+
3. Name: `gss-diffdock-engine` (or your preferred name)
|
| 62 |
+
4. SDK: **Gradio**
|
| 63 |
+
5. Hardware: **CPU Basic** (Free)
|
| 64 |
+
6. Visibility: Public or Private
|
| 65 |
+
|
| 66 |
+
### 2. Upload Files
|
| 67 |
+
Upload these three files to your Space repository:
|
| 68 |
+
- `packages.txt`
|
| 69 |
+
- `requirements.txt`
|
| 70 |
+
- `app.py`
|
| 71 |
+
|
| 72 |
+
### 3. Wait for Build
|
| 73 |
+
Hugging Face will:
|
| 74 |
+
1. Install system packages (1-2 minutes)
|
| 75 |
+
2. Install Python dependencies (3-5 minutes)
|
| 76 |
+
3. Clone DiffDock and download weights (5-10 minutes)
|
| 77 |
+
4. Start the application
|
| 78 |
+
|
| 79 |
+
Total build time: **10-15 minutes**
|
| 80 |
+
|
| 81 |
+
### 4. Verify Deployment
|
| 82 |
+
Once status shows **"Running"**:
|
| 83 |
+
- The Space URL will be active
|
| 84 |
+
- API endpoint will be available at: `https://YOUR-USERNAME-gss-diffdock-engine.hf.space/api/execute_diffdock_prediction`
|
| 85 |
+
|
| 86 |
+
## API Usage
|
| 87 |
+
|
| 88 |
+
### Request Format
|
| 89 |
+
```bash
|
| 90 |
+
curl -X POST "https://YOUR-USERNAME-gss-diffdock-engine.hf.space/api/execute_diffdock_prediction" \
|
| 91 |
+
-H "Content-Type: application/json" \
|
| 92 |
+
-d '{
|
| 93 |
+
"data": [
|
| 94 |
+
"PROTEIN_PDB_CONTENT_HERE",
|
| 95 |
+
"LIGAND_SMILES_STRING_HERE"
|
| 96 |
+
]
|
| 97 |
+
}'
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
### Response Format
|
| 101 |
+
```json
|
| 102 |
+
{
|
| 103 |
+
"data": [{
|
| 104 |
+
"success": true,
|
| 105 |
+
"diffdock_confidence_score": 0.85,
|
| 106 |
+
"hardware_allocation": "HF_FREE_CPU_TIER"
|
| 107 |
+
}]
|
| 108 |
+
}
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
## Performance Optimizations
|
| 112 |
+
|
| 113 |
+
### Memory Management
|
| 114 |
+
- **Inference steps**: Limited to 10 (vs default 20)
|
| 115 |
+
- **Samples per complex**: 1 (vs default 40)
|
| 116 |
+
- **Cleanup**: Automatic removal of temporary files
|
| 117 |
+
|
| 118 |
+
### CPU Constraints
|
| 119 |
+
- Thread count capped at 2
|
| 120 |
+
- Single pose generation
|
| 121 |
+
- Aggressive memory cleanup
|
| 122 |
+
|
| 123 |
+
## Integration with Cloudflare Worker
|
| 124 |
+
|
| 125 |
+
The next step is to create a Cloudflare Worker handler that:
|
| 126 |
+
1. Receives drug development requests from Window 8
|
| 127 |
+
2. Formats protein/ligand data
|
| 128 |
+
3. Calls this Hugging Face API
|
| 129 |
+
4. Stores results in D1 database
|
| 130 |
+
5. Returns predictions to frontend
|
| 131 |
+
|
| 132 |
+
## Troubleshooting
|
| 133 |
+
|
| 134 |
+
### Build Failures
|
| 135 |
+
- Check logs for missing dependencies
|
| 136 |
+
- Verify file names are exact (case-sensitive)
|
| 137 |
+
- Ensure no extra whitespace in files
|
| 138 |
+
|
| 139 |
+
### Timeout Errors
|
| 140 |
+
- Inference is limited to 10 steps for speed
|
| 141 |
+
- Consider upgrading to paid tier for faster processing
|
| 142 |
+
|
| 143 |
+
### Memory Issues
|
| 144 |
+
- Current config optimized for 16GB RAM limit
|
| 145 |
+
- Reduce inference steps if needed
|
| 146 |
+
|
| 147 |
+
## Next Steps
|
| 148 |
+
|
| 149 |
+
1. ✅ Deploy to Hugging Face Spaces
|
| 150 |
+
2. ⏳ Create Cloudflare Worker integration
|
| 151 |
+
3. ⏳ Add D1 database schema for drug predictions
|
| 152 |
+
4. ⏳ Build Window 8 frontend interface
|
| 153 |
+
5. ⏳ Implement result visualization
|
| 154 |
+
|
| 155 |
+
## Support
|
| 156 |
+
|
| 157 |
+
For issues or questions:
|
| 158 |
+
- Hugging Face Docs: https://huggingface.co/docs/hub/spaces
|
| 159 |
+
- DiffDock Paper: https://arxiv.org/abs/2210.01776
|
| 160 |
+
- DiffDock Repo: https://github.com/gcorso/DiffDock
|
| 161 |
+
|
| 162 |
+
---
|
| 163 |
+
|
| 164 |
+
**Gaston Software Solutions LLP**
|
| 165 |
+
Window 8: Drug Development & Molecular Docking Engine
|
app.py
ADDED
|
@@ -0,0 +1,115 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
import os
|
| 3 |
+
import sys
|
| 4 |
+
import subprocess
|
| 5 |
+
import torch
|
| 6 |
+
|
| 7 |
+
# 🛠️ Step 1: Optimize execution matrix for Hugging Face's 2 free CPU threads
|
| 8 |
+
torch.set_num_threads(2)
|
| 9 |
+
os.environ["OMP_NUM_THREADS"] = "2"
|
| 10 |
+
os.environ["MKL_NUM_THREADS"] = "2"
|
| 11 |
+
|
| 12 |
+
# 🧬 Step 2: Automated setup for the core DiffDock Neural Architecture layers
|
| 13 |
+
if not os.path.exists("DiffDock"):
|
| 14 |
+
print("[GSS LOG] Initializing DiffDock repo architectures...")
|
| 15 |
+
subprocess.run(["git", "clone", "https://github.com/gcorso/DiffDock.git"])
|
| 16 |
+
|
| 17 |
+
print("[GSS LOG] Fetching foundational pretrained academic weight structures...")
|
| 18 |
+
# Pulling the public, pre-computed spatial scoring weights
|
| 19 |
+
subprocess.run(["wget", "https://zenodo.org/record/7651515/files/workdir.zip"])
|
| 20 |
+
subprocess.run(["unzip", "workdir.zip", "-d", "DiffDock/"])
|
| 21 |
+
|
| 22 |
+
sys.path.append(os.path.abspath("DiffDock"))
|
| 23 |
+
|
| 24 |
+
def run_diffdock_inference(protein_pdb_content, ligand_smiles_string):
|
| 25 |
+
"""
|
| 26 |
+
Ingests raw target pathogen protein text from Window 7 and the candidate
|
| 27 |
+
chemical SMILES sequence, mapping the docking coordinates entirely via CPU.
|
| 28 |
+
"""
|
| 29 |
+
pid = os.getpid()
|
| 30 |
+
protein_path = f"target_pathogen_{pid}.pdb"
|
| 31 |
+
csv_path = f"input_manifest_{pid}.csv"
|
| 32 |
+
output_dir = f"results_{pid}"
|
| 33 |
+
|
| 34 |
+
try:
|
| 35 |
+
# 1. Output edge node payloads directly to physical system space files
|
| 36 |
+
with open(protein_path, "w") as f:
|
| 37 |
+
f.write(protein_pdb_content)
|
| 38 |
+
|
| 39 |
+
# 2. Build the task manifest table file matching DiffDock's intake expectations
|
| 40 |
+
import pandas as pd
|
| 41 |
+
manifest_df = pd.DataFrame([{
|
| 42 |
+
"complex_name": "gss_candidate",
|
| 43 |
+
"protein_path": protein_path,
|
| 44 |
+
"ligand_description": ligand_smiles_string,
|
| 45 |
+
"protein_sequence": ""
|
| 46 |
+
}])
|
| 47 |
+
manifest_df.to_csv(csv_path, index=False)
|
| 48 |
+
|
| 49 |
+
# 3. Construct the execution array with aggressive CPU concessions
|
| 50 |
+
# We clamp inference steps to 10 and output poses to 1 to stay inside memory lines
|
| 51 |
+
cmd = [
|
| 52 |
+
sys.executable, "DiffDock/inference.py",
|
| 53 |
+
"--config", "DiffDock/default_inference_args.yaml",
|
| 54 |
+
"--protein_ligand_csv", csv_path,
|
| 55 |
+
"--out_dir", output_dir,
|
| 56 |
+
"--inference_steps", "10",
|
| 57 |
+
"--samples_per_complex", "1",
|
| 58 |
+
"--actual_steps", "10"
|
| 59 |
+
]
|
| 60 |
+
|
| 61 |
+
# Run execution loop through python process mapping pipelines
|
| 62 |
+
execution_run = subprocess.run(cmd, capture_output=True, text=True)
|
| 63 |
+
|
| 64 |
+
# 4. Parse the output results table to locate the match confidence metric
|
| 65 |
+
confidence_metric = -1.0 # Fallback default value
|
| 66 |
+
summary_sheet = os.path.join(output_dir, "summary.csv")
|
| 67 |
+
|
| 68 |
+
if os.path.exists(summary_sheet):
|
| 69 |
+
summary_df = pd.read_csv(summary_sheet)
|
| 70 |
+
if "confidence" in summary_df.columns and not summary_df.empty:
|
| 71 |
+
confidence_metric = float(summary_df.iloc[0]["confidence"])
|
| 72 |
+
|
| 73 |
+
return {
|
| 74 |
+
"success": True,
|
| 75 |
+
"diffdock_confidence_score": confidence_metric,
|
| 76 |
+
"hardware_allocation": "HF_FREE_CPU_TIER"
|
| 77 |
+
}
|
| 78 |
+
|
| 79 |
+
except Exception as runtime_fault:
|
| 80 |
+
return {
|
| 81 |
+
"success": False,
|
| 82 |
+
"error_log": str(runtime_fault)
|
| 83 |
+
}
|
| 84 |
+
|
| 85 |
+
finally:
|
| 86 |
+
# Clean up temporary generation artifacts from physical memory storage
|
| 87 |
+
for temp_file in [protein_path, csv_path]:
|
| 88 |
+
if os.path.exists(temp_file):
|
| 89 |
+
os.remove(temp_file)
|
| 90 |
+
if os.path.exists(output_dir):
|
| 91 |
+
import shutil
|
| 92 |
+
shutil.rmtree(output_dir)
|
| 93 |
+
|
| 94 |
+
# 🌐 Step 3: Instantiate the App Dashboard and expose the API schema
|
| 95 |
+
with gr.Blocks() as demo:
|
| 96 |
+
gr.Markdown("# Gaston Software Solutions LLP — Window 8 Engine")
|
| 97 |
+
gr.Markdown("Active Mode: Decentralized Independent CPU Inference Matrix.")
|
| 98 |
+
|
| 99 |
+
# Hidden registration endpoints to receive data programmatically from Cloudflare
|
| 100 |
+
protein_input_field = gr.Textbox(visible=False, label="Protein Data Stream")
|
| 101 |
+
ligand_input_field = gr.Textbox(visible=False, label="Ligand SMILES Chain")
|
| 102 |
+
json_output_response = gr.JSON()
|
| 103 |
+
|
| 104 |
+
# Named API route link mapped to your cloud architecture hook
|
| 105 |
+
api_trigger_node = gr.Button("Execute Processing", visible=False)
|
| 106 |
+
api_trigger_node.click(
|
| 107 |
+
run_diffdock_inference,
|
| 108 |
+
inputs=[protein_input_field, ligand_input_field],
|
| 109 |
+
outputs=json_output_response,
|
| 110 |
+
api_name="execute_diffdock_prediction"
|
| 111 |
+
)
|
| 112 |
+
|
| 113 |
+
demo.launch()
|
| 114 |
+
|
| 115 |
+
# Made with Bob
|
packages.txt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
unzip
|
| 2 |
+
wget
|
| 3 |
+
libgl1-mesa-glx
|
requirements.txt
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
--extra-index-url https://download.pytorch.org/whl/cpu
|
| 2 |
+
torch==2.2.1
|
| 3 |
+
torch-geometric==2.5.2
|
| 4 |
+
biopython==1.83
|
| 5 |
+
rdkit==2023.9.5
|
| 6 |
+
gradio==4.19.2
|
| 7 |
+
pandas==2.2.1
|
| 8 |
+
pyyaml==6.0.1
|
| 9 |
+
scipy==1.12.0
|
| 10 |
+
networkx==3.2.1
|