Spaces:

gsstec
/

gss-diffdock-engine

Running

App Files Files Community

gsstec commited on May 29

Commit

57fd2f0

verified ·

1 Parent(s): e970142

Upload 4 files

Browse files

Files changed (4) hide show

README.md +165 -13
app.py +115 -0
packages.txt +3 -0
requirements.txt +10 -0

README.md CHANGED Viewed

@@ -1,13 +1,165 @@
----
-title: Gss Diffdock Engine
-emoji: 🚀
-colorFrom: gray
-colorTo: gray
-sdk: gradio
-sdk_version: 6.15.2
-python_version: '3.13'
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# DiffDock API Layer for Window 8 Drug Development
+## Overview
+This directory contains the optimized DiffDock molecular docking engine designed to run on Hugging Face's **free CPU Basic tier** (2 vCPUs). It provides protein-ligand binding affinity predictions for drug development analysis.
+## Architecture
+- **Platform**: Hugging Face Spaces (Gradio SDK)
+- **Hardware**: CPU Basic (Free Tier - 2 vCPUs)
+- **Framework**: DiffDock neural network for molecular docking
+- **API**: RESTful endpoint for Cloudflare Worker integration
+## Files
+### 1. `packages.txt`
+System-level dependencies installed before Python setup:
+- `unzip` - Archive extraction
+- `wget` - File downloads
+- `libgl1-mesa-glx` - OpenGL support for molecular visualization
+### 2. `requirements.txt`
+Python dependencies optimized for CPU execution:
+- **PyTorch 2.2.1** (CPU-only build)
+- **torch-geometric 2.5.2** - Graph neural networks
+- **biopython 1.83** - Biological computation
+- **rdkit 2023.9.5** - Chemical informatics
+- **gradio 4.19.2** - Web interface and API
+- **pandas 2.2.1** - Data manipulation
+- **pyyaml 6.0.1** - Configuration parsing
+- **scipy 1.12.0** - Scientific computing
+- **networkx 3.2.1** - Graph algorithms
+### 3. `app.py`
+Main application with three key components:
+#### CPU Optimization
+```python
+torch.set_num_threads(2)
+os.environ["OMP_NUM_THREADS"] = "2"
+os.environ["MKL_NUM_THREADS"] = "2"
+```
+Limits thread usage to match free tier allocation.
+#### Automated Setup
+- Clones DiffDock repository
+- Downloads pre-trained weights from Zenodo
+- Configures inference pipeline
+#### API Endpoint
+- **Function**: `run_diffdock_inference(protein_pdb_content, ligand_smiles_string)`
+- **Input**:
+  - Protein structure (PDB format)
+  - Ligand molecule (SMILES string)
+- **Output**: JSON with confidence score
+- **API Name**: `execute_diffdock_prediction`
+## Deployment Steps
+### 1. Create Hugging Face Space
+1. Go to https://huggingface.co/spaces
+2. Click **"Create a New Space"**
+3. Name: `gss-diffdock-engine` (or your preferred name)
+4. SDK: **Gradio**
+5. Hardware: **CPU Basic** (Free)
+6. Visibility: Public or Private
+### 2. Upload Files
+Upload these three files to your Space repository:
+- `packages.txt`
+- `requirements.txt`
+- `app.py`
+### 3. Wait for Build
+Hugging Face will:
+1. Install system packages (1-2 minutes)
+2. Install Python dependencies (3-5 minutes)
+3. Clone DiffDock and download weights (5-10 minutes)
+4. Start the application
+Total build time: **10-15 minutes**
+### 4. Verify Deployment
+Once status shows **"Running"**:
+- The Space URL will be active
+- API endpoint will be available at: `https://YOUR-USERNAME-gss-diffdock-engine.hf.space/api/execute_diffdock_prediction`
+## API Usage
+### Request Format
+```bash
+curl -X POST "https://YOUR-USERNAME-gss-diffdock-engine.hf.space/api/execute_diffdock_prediction" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "data": [
+      "PROTEIN_PDB_CONTENT_HERE",
+      "LIGAND_SMILES_STRING_HERE"
+    ]
+  }'
+```
+### Response Format
+```json
+{
+  "data": [{
+    "success": true,
+    "diffdock_confidence_score": 0.85,
+    "hardware_allocation": "HF_FREE_CPU_TIER"
+  }]
+}
+```
+## Performance Optimizations
+### Memory Management
+- **Inference steps**: Limited to 10 (vs default 20)
+- **Samples per complex**: 1 (vs default 40)
+- **Cleanup**: Automatic removal of temporary files
+### CPU Constraints
+- Thread count capped at 2
+- Single pose generation
+- Aggressive memory cleanup
+## Integration with Cloudflare Worker
+The next step is to create a Cloudflare Worker handler that:
+1. Receives drug development requests from Window 8
+2. Formats protein/ligand data
+3. Calls this Hugging Face API
+4. Stores results in D1 database
+5. Returns predictions to frontend
+## Troubleshooting
+### Build Failures
+- Check logs for missing dependencies
+- Verify file names are exact (case-sensitive)
+- Ensure no extra whitespace in files
+### Timeout Errors
+- Inference is limited to 10 steps for speed
+- Consider upgrading to paid tier for faster processing
+### Memory Issues
+- Current config optimized for 16GB RAM limit
+- Reduce inference steps if needed
+## Next Steps
+1. ✅ Deploy to Hugging Face Spaces
+2. ⏳ Create Cloudflare Worker integration
+3. ⏳ Add D1 database schema for drug predictions
+4. ⏳ Build Window 8 frontend interface
+5. ⏳ Implement result visualization
+## Support
+For issues or questions:
+- Hugging Face Docs: https://huggingface.co/docs/hub/spaces
+- DiffDock Paper: https://arxiv.org/abs/2210.01776
+- DiffDock Repo: https://github.com/gcorso/DiffDock
+---
+**Gaston Software Solutions LLP**
+Window 8: Drug Development & Molecular Docking Engine

app.py ADDED Viewed

	@@ -0,0 +1,115 @@

+import gradio as gr
+import os
+import sys
+import subprocess
+import torch
+# 🛠️ Step 1: Optimize execution matrix for Hugging Face's 2 free CPU threads
+torch.set_num_threads(2)
+os.environ["OMP_NUM_THREADS"] = "2"
+os.environ["MKL_NUM_THREADS"] = "2"
+# 🧬 Step 2: Automated setup for the core DiffDock Neural Architecture layers
+if not os.path.exists("DiffDock"):
+    print("[GSS LOG] Initializing DiffDock repo architectures...")
+    subprocess.run(["git", "clone", "https://github.com/gcorso/DiffDock.git"])
+    print("[GSS LOG] Fetching foundational pretrained academic weight structures...")
+    # Pulling the public, pre-computed spatial scoring weights
+    subprocess.run(["wget", "https://zenodo.org/record/7651515/files/workdir.zip"])
+    subprocess.run(["unzip", "workdir.zip", "-d", "DiffDock/"])
+sys.path.append(os.path.abspath("DiffDock"))
+def run_diffdock_inference(protein_pdb_content, ligand_smiles_string):
+    """
+    Ingests raw target pathogen protein text from Window 7 and the candidate
+    chemical SMILES sequence, mapping the docking coordinates entirely via CPU.
+    """
+    pid = os.getpid()
+    protein_path = f"target_pathogen_{pid}.pdb"
+    csv_path = f"input_manifest_{pid}.csv"
+    output_dir = f"results_{pid}"
+    try:
+        # 1. Output edge node payloads directly to physical system space files
+        with open(protein_path, "w") as f:
+            f.write(protein_pdb_content)
+        # 2. Build the task manifest table file matching DiffDock's intake expectations
+        import pandas as pd
+        manifest_df = pd.DataFrame([{
+            "complex_name": "gss_candidate",
+            "protein_path": protein_path,
+            "ligand_description": ligand_smiles_string,
+            "protein_sequence": ""
+        }])
+        manifest_df.to_csv(csv_path, index=False)
+        # 3. Construct the execution array with aggressive CPU concessions
+        # We clamp inference steps to 10 and output poses to 1 to stay inside memory lines
+        cmd = [
+            sys.executable, "DiffDock/inference.py",
+            "--config", "DiffDock/default_inference_args.yaml",
+            "--protein_ligand_csv", csv_path,
+            "--out_dir", output_dir,
+            "--inference_steps", "10",
+            "--samples_per_complex", "1",
+            "--actual_steps", "10"
+        ]
+        # Run execution loop through python process mapping pipelines
+        execution_run = subprocess.run(cmd, capture_output=True, text=True)
+        # 4. Parse the output results table to locate the match confidence metric
+        confidence_metric = -1.0  # Fallback default value
+        summary_sheet = os.path.join(output_dir, "summary.csv")
+        if os.path.exists(summary_sheet):
+            summary_df = pd.read_csv(summary_sheet)
+            if "confidence" in summary_df.columns and not summary_df.empty:
+                confidence_metric = float(summary_df.iloc[0]["confidence"])
+        return {
+            "success": True,
+            "diffdock_confidence_score": confidence_metric,
+            "hardware_allocation": "HF_FREE_CPU_TIER"
+        }
+    except Exception as runtime_fault:
+        return {
+            "success": False,
+            "error_log": str(runtime_fault)
+        }
+    finally:
+        # Clean up temporary generation artifacts from physical memory storage
+        for temp_file in [protein_path, csv_path]:
+            if os.path.exists(temp_file):
+                os.remove(temp_file)
+        if os.path.exists(output_dir):
+            import shutil
+            shutil.rmtree(output_dir)
+# 🌐 Step 3: Instantiate the App Dashboard and expose the API schema
+with gr.Blocks() as demo:
+    gr.Markdown("# Gaston Software Solutions LLP — Window 8 Engine")
+    gr.Markdown("Active Mode: Decentralized Independent CPU Inference Matrix.")
+    # Hidden registration endpoints to receive data programmatically from Cloudflare
+    protein_input_field = gr.Textbox(visible=False, label="Protein Data Stream")
+    ligand_input_field = gr.Textbox(visible=False, label="Ligand SMILES Chain")
+    json_output_response = gr.JSON()
+    # Named API route link mapped to your cloud architecture hook
+    api_trigger_node = gr.Button("Execute Processing", visible=False)
+    api_trigger_node.click(
+        run_diffdock_inference,
+        inputs=[protein_input_field, ligand_input_field],
+        outputs=json_output_response,
+        api_name="execute_diffdock_prediction"
+    )
+demo.launch()
+# Made with Bob

packages.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+unzip
+wget
+libgl1-mesa-glx

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+--extra-index-url https://download.pytorch.org/whl/cpu
+torch==2.2.1
+torch-geometric==2.5.2
+biopython==1.83
+rdkit==2023.9.5
+gradio==4.19.2
+pandas==2.2.1
+pyyaml==6.0.1
+scipy==1.12.0
+networkx==3.2.1