gss-diffdock-engine / README.md
gsstec's picture
Upload 4 files
ab7f446 verified
|
Raw
History Blame Contribute Delete
4.99 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: GSS DiffDock Engine
emoji: 🧬
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 4.36.1
python_version: '3.10'
app_file: app.py
pinned: false

DiffDock API Layer for Window 8 Drug Development

Overview

This directory contains the optimized DiffDock molecular docking engine designed to run on Hugging Face's free CPU Basic tier (2 vCPUs). It provides protein-ligand binding affinity predictions for drug development analysis.

Architecture

  • Platform: Hugging Face Spaces (Gradio SDK)
  • Hardware: CPU Basic (Free Tier - 2 vCPUs)
  • Framework: DiffDock neural network for molecular docking
  • API: RESTful endpoint for Cloudflare Worker integration

Files

1. packages.txt

System-level dependencies installed before Python setup:

  • unzip - Archive extraction
  • wget - File downloads
  • libgl1-mesa-glx - OpenGL support for molecular visualization

2. requirements.txt

Python dependencies optimized for CPU execution:

  • PyTorch 2.2.1 (CPU-only build)
  • torch-geometric 2.5.2 - Graph neural networks
  • biopython 1.83 - Biological computation
  • rdkit 2023.9.5 - Chemical informatics
  • gradio 4.19.2 - Web interface and API
  • pandas 2.2.1 - Data manipulation
  • pyyaml 6.0.1 - Configuration parsing
  • scipy 1.12.0 - Scientific computing
  • networkx 3.2.1 - Graph algorithms

3. app.py

Main application with three key components:

CPU Optimization

torch.set_num_threads(2)
os.environ["OMP_NUM_THREADS"] = "2"
os.environ["MKL_NUM_THREADS"] = "2"

Limits thread usage to match free tier allocation.

Automated Setup

  • Clones DiffDock repository
  • Downloads pre-trained weights from Zenodo
  • Configures inference pipeline

API Endpoint

  • Function: run_diffdock_inference(protein_pdb_content, ligand_smiles_string)
  • Input:
    • Protein structure (PDB format)
    • Ligand molecule (SMILES string)
  • Output: JSON with confidence score
  • API Name: execute_diffdock_prediction

Deployment Steps

1. Create Hugging Face Space

  1. Go to https://huggingface.co/spaces
  2. Click "Create a New Space"
  3. Name: gss-diffdock-engine (or your preferred name)
  4. SDK: Gradio
  5. Hardware: CPU Basic (Free)
  6. Visibility: Public or Private

2. Upload Files

Upload these three files to your Space repository:

  • packages.txt
  • requirements.txt
  • app.py

3. Wait for Build

Hugging Face will:

  1. Install system packages (1-2 minutes)
  2. Install Python dependencies (3-5 minutes)
  3. Clone DiffDock and download weights (5-10 minutes)
  4. Start the application

Total build time: 10-15 minutes

4. Verify Deployment

Once status shows "Running":

  • The Space URL will be active
  • API endpoint will be available at: https://YOUR-USERNAME-gss-diffdock-engine.hf.space/api/execute_diffdock_prediction

API Usage

Request Format

curl -X POST "https://YOUR-USERNAME-gss-diffdock-engine.hf.space/api/execute_diffdock_prediction" \
  -H "Content-Type: application/json" \
  -d '{
    "data": [
      "PROTEIN_PDB_CONTENT_HERE",
      "LIGAND_SMILES_STRING_HERE"
    ]
  }'

Response Format

{
  "data": [{
    "success": true,
    "diffdock_confidence_score": 0.85,
    "hardware_allocation": "HF_FREE_CPU_TIER"
  }]
}

Performance Optimizations

Memory Management

  • Inference steps: Limited to 10 (vs default 20)
  • Samples per complex: 1 (vs default 40)
  • Cleanup: Automatic removal of temporary files

CPU Constraints

  • Thread count capped at 2
  • Single pose generation
  • Aggressive memory cleanup

Integration with Cloudflare Worker

The next step is to create a Cloudflare Worker handler that:

  1. Receives drug development requests from Window 8
  2. Formats protein/ligand data
  3. Calls this Hugging Face API
  4. Stores results in D1 database
  5. Returns predictions to frontend

Troubleshooting

Build Failures

  • Check logs for missing dependencies
  • Verify file names are exact (case-sensitive)
  • Ensure no extra whitespace in files

Timeout Errors

  • Inference is limited to 10 steps for speed
  • Consider upgrading to paid tier for faster processing

Memory Issues

  • Current config optimized for 16GB RAM limit
  • Reduce inference steps if needed

Next Steps

  1. ✅ Deploy to Hugging Face Spaces
  2. ⏳ Create Cloudflare Worker integration
  3. ⏳ Add D1 database schema for drug predictions
  4. ⏳ Build Window 8 frontend interface
  5. ⏳ Implement result visualization

Support

For issues or questions:


Gaston Software Solutions LLP
Window 8: Drug Development & Molecular Docking Engine