gss-diffdock-engine / README.md
gsstec's picture
Upload 4 files
ab7f446 verified
|
Raw
History Blame Contribute Delete
4.99 kB
---
title: GSS DiffDock Engine
emoji: 🧬
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: "4.36.1"
python_version: "3.10"
app_file: app.py
pinned: false
---
# DiffDock API Layer for Window 8 Drug Development
## Overview
This directory contains the optimized DiffDock molecular docking engine designed to run on Hugging Face's **free CPU Basic tier** (2 vCPUs). It provides protein-ligand binding affinity predictions for drug development analysis.
## Architecture
- **Platform**: Hugging Face Spaces (Gradio SDK)
- **Hardware**: CPU Basic (Free Tier - 2 vCPUs)
- **Framework**: DiffDock neural network for molecular docking
- **API**: RESTful endpoint for Cloudflare Worker integration
## Files
### 1. `packages.txt`
System-level dependencies installed before Python setup:
- `unzip` - Archive extraction
- `wget` - File downloads
- `libgl1-mesa-glx` - OpenGL support for molecular visualization
### 2. `requirements.txt`
Python dependencies optimized for CPU execution:
- **PyTorch 2.2.1** (CPU-only build)
- **torch-geometric 2.5.2** - Graph neural networks
- **biopython 1.83** - Biological computation
- **rdkit 2023.9.5** - Chemical informatics
- **gradio 4.19.2** - Web interface and API
- **pandas 2.2.1** - Data manipulation
- **pyyaml 6.0.1** - Configuration parsing
- **scipy 1.12.0** - Scientific computing
- **networkx 3.2.1** - Graph algorithms
### 3. `app.py`
Main application with three key components:
#### CPU Optimization
```python
torch.set_num_threads(2)
os.environ["OMP_NUM_THREADS"] = "2"
os.environ["MKL_NUM_THREADS"] = "2"
```
Limits thread usage to match free tier allocation.
#### Automated Setup
- Clones DiffDock repository
- Downloads pre-trained weights from Zenodo
- Configures inference pipeline
#### API Endpoint
- **Function**: `run_diffdock_inference(protein_pdb_content, ligand_smiles_string)`
- **Input**:
- Protein structure (PDB format)
- Ligand molecule (SMILES string)
- **Output**: JSON with confidence score
- **API Name**: `execute_diffdock_prediction`
## Deployment Steps
### 1. Create Hugging Face Space
1. Go to https://huggingface.co/spaces
2. Click **"Create a New Space"**
3. Name: `gss-diffdock-engine` (or your preferred name)
4. SDK: **Gradio**
5. Hardware: **CPU Basic** (Free)
6. Visibility: Public or Private
### 2. Upload Files
Upload these three files to your Space repository:
- `packages.txt`
- `requirements.txt`
- `app.py`
### 3. Wait for Build
Hugging Face will:
1. Install system packages (1-2 minutes)
2. Install Python dependencies (3-5 minutes)
3. Clone DiffDock and download weights (5-10 minutes)
4. Start the application
Total build time: **10-15 minutes**
### 4. Verify Deployment
Once status shows **"Running"**:
- The Space URL will be active
- API endpoint will be available at: `https://YOUR-USERNAME-gss-diffdock-engine.hf.space/api/execute_diffdock_prediction`
## API Usage
### Request Format
```bash
curl -X POST "https://YOUR-USERNAME-gss-diffdock-engine.hf.space/api/execute_diffdock_prediction" \
-H "Content-Type: application/json" \
-d '{
"data": [
"PROTEIN_PDB_CONTENT_HERE",
"LIGAND_SMILES_STRING_HERE"
]
}'
```
### Response Format
```json
{
"data": [{
"success": true,
"diffdock_confidence_score": 0.85,
"hardware_allocation": "HF_FREE_CPU_TIER"
}]
}
```
## Performance Optimizations
### Memory Management
- **Inference steps**: Limited to 10 (vs default 20)
- **Samples per complex**: 1 (vs default 40)
- **Cleanup**: Automatic removal of temporary files
### CPU Constraints
- Thread count capped at 2
- Single pose generation
- Aggressive memory cleanup
## Integration with Cloudflare Worker
The next step is to create a Cloudflare Worker handler that:
1. Receives drug development requests from Window 8
2. Formats protein/ligand data
3. Calls this Hugging Face API
4. Stores results in D1 database
5. Returns predictions to frontend
## Troubleshooting
### Build Failures
- Check logs for missing dependencies
- Verify file names are exact (case-sensitive)
- Ensure no extra whitespace in files
### Timeout Errors
- Inference is limited to 10 steps for speed
- Consider upgrading to paid tier for faster processing
### Memory Issues
- Current config optimized for 16GB RAM limit
- Reduce inference steps if needed
## Next Steps
1. ✅ Deploy to Hugging Face Spaces
2. ⏳ Create Cloudflare Worker integration
3. ⏳ Add D1 database schema for drug predictions
4. ⏳ Build Window 8 frontend interface
5. ⏳ Implement result visualization
## Support
For issues or questions:
- Hugging Face Docs: https://huggingface.co/docs/hub/spaces
- DiffDock Paper: https://arxiv.org/abs/2210.01776
- DiffDock Repo: https://github.com/gcorso/DiffDock
---
**Gaston Software Solutions LLP**
Window 8: Drug Development & Molecular Docking Engine