Spaces:

vignt97867896
/

bioflow

Sleeping

App Files Files Community

vignt97867896 commited on Jan 29

Commit

fbcbc07

verified ·

1 Parent(s): 54592ce

Upload folder using huggingface_hub

Browse files

Files changed (1) hide show

README.md +20 -236

README.md CHANGED Viewed

@@ -1,236 +1,20 @@
-# BioFlow
-> **Multimodal Biological Design & Discovery Intelligence Engine**
-> A low-code workflow platform for unified biological discovery pipelines
-![Python](https://img.shields.io/badge/Python-3.10-blue)
-![Next.js](https://img.shields.io/badge/Next.js-16-black)
-![Qdrant](https://img.shields.io/badge/Qdrant-Vector_DB-red)
-![CUDA](https://img.shields.io/badge/CUDA-11.8-green)
-![Team](https://img.shields.io/badge/Team-Lacoste-purple)
----
-## Problem Statement
-Biological R&D knowledge is fragmented across disconnected silos:
-- **Textual literature** (papers, lab notes)
-- **3D structural data** (PDB files)
-- **Chemical sequences** (SMILES)
-Researchers must manually navigate incompatible formats, creating bottlenecks and "blind spots" where critical connections are missed.
-## Our Solution
-**BioFlow** is a visual workflow engine that unifies biological discovery pipelines. Rather than a single "black box" model, we function as an **intelligent platform** — allowing researchers to chain state-of-the-art open-source biological models into coherent discovery workflows.
-### Key Features
-| Feature | Description |
-|---------|-------------|
-| **Visual Pipeline Builder** | Drag-and-drop node editor for constructing discovery workflows |
-| **DeepPurpose Integration** | Drug-Target Interaction prediction with Morgan + CNN encoding |
-| **Molecule & Protein Visualization** | Interactive 2D SMILES and 3D PDB structure viewing (powered by 3Dmol.js and SmilesDrawer) |
-| **Qdrant Vector Search** | High-dimensional similarity search across 23,531+ compounds |
-| **3D Embedding Explorer** | Real PCA projections of drug-target chemical space |
-| **Validator Agents** | Automated toxicity and novelty checking |
----
-## Architecture
-```
-                         ┌──────────────────────────────────────────┐
-                         │                 BioFlow                  │
-                         │      Visual Pipeline Builder (UI)        │
-                         └─────────────────┬────────────────────────┘
-                                           │
-         ┌─────────────────────────────────┼─────────────────────────────────┐
-         │                                 │                                 │
-         ▼                                 ▼                                 ▼
-┌─────────────────┐             ┌─────────────────┐             ┌─────────────────┐
-│   Data Input    │             │   DeepPurpose   │             │   OpenBioMed    │
-│  SMILES/Protein │────────────▶│   DTI Model     │────────────▶│   Multimodal    │
-│   Sequences     │             │  Morgan + CNN   │             │   Embeddings    │
-└─────────────────┘             └────────┬────────┘             └────────┬────────┘
-                                         │                               │
-                                         └───────────────┬───────────────┘
-                                                         │
-                                                         ▼
-                                              ┌─────────────────┐
-                                              │     Qdrant      │
-                                              │   Vector DB     │
-                                              │  HNSW Indexing  │
-                                              │  23,531 vectors │
-                                              └────────┬────────┘
-                                                       │
-                         ┌─────────────────────────────┼─────────────────────────────┐
-                         │                             │                             │
-                         ▼                             ▼                             ▼
-              ┌─────────────────┐          ┌─────────────────┐          ┌─────────────────┐
-              │ Similarity      │          │   Validator     │          │    Results      │
-              │ Search Agent    │          │   Agent         │          │    Output       │
-              │ Top-K Retrieval │          │ Toxicity/Novelty│          │   Candidates    │
-              └─────────────────┘          └─────────────────┘          └─────────────────┘
-```
----
-## Model Performance
-| Dataset | Concordance Index | Pearson | MSE |
-|---------|-------------------|---------|-----|
-| **KIBA** | 0.7003 | 0.5219 | 0.0008 |
-| **BindingDB_Kd** | 0.8083 | 0.7679 | 0.6668 |
-| **DAVIS** | 0.7914 | 0.5446 | 0.4684 |
----
-## Quick Start
-### Prerequisites
-- Python 3.10+
-- Node.js 18+
-- Docker Desktop
-- CUDA 11.8 (optional, for GPU acceleration)
-### 1. Clone & Setup
-```bash
-git clone https://github.com/hamzasammoud11-dotcom/lacoste001.git
-cd lacoste001
-# Python environment
-python -m venv .venv
-.venv\Scripts\activate  # Windows
-pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
-pip install DeepPurpose qdrant-client fastapi uvicorn scikit-learn
-```
-### 2. Start Qdrant Vector Database
-```bash
-docker run -d --name qdrant -p 6333:6333 -p 6334:6334 qdrant/qdrant:latest
-```
-### 3. Ingest Data (One-time)
-```bash
-python ingest_qdrant.py
-# Loads KIBA dataset → DeepPurpose embeddings → Qdrant
-# ~23,531 drug-target pairs indexed
-```
-### 4. Start Backend API
-```bash
-python -m uvicorn bioflow.api.server:app --host 0.0.0.0 --port 8001
-```
-### 5. Start Frontend
-```bash
-cd ui
-pnpm install
-pnpm dev
-# Open http://localhost:3000
-```
-### 6. Start Langflow (Visual Workflow Builder)
-```bash
-# You can use the provided script
-./run_langflow.bat
-# Or manually:
-pip install langflow
-langflow run --host 0.0.0.0 --port 7860
-# Access via http://localhost:3000/workflow (embedded)
-# Or directly at http://localhost:7860
-```
----
-## Visual Workflow Builder (Langflow Integration)
-BioFlow integrates **Langflow** as the visual workflow engine, providing a full-screen drag-and-drop pipeline builder accessible from `/workflow`.
-### Building a DTI Pipeline in Langflow
-1. **Import the Template Flow**:
-   - Open Langflow (`/workflow` or `localhost:7860`)
-   - Click "New Project" → "Import"
-   - Load `langflow/bioflow_dti_pipeline.json`
-2. **Configure the Pipeline**:
-   - **Drug Input**: Enter SMILES string (e.g., `CC(=O)Nc1ccc(O)cc1`)
-   - **Target Input**: Enter protein sequence
-   - **API Nodes**: Point to `http://localhost:8001/api/*`
-3. **Run the Flow**:
-   - Click "Run" to execute DeepPurpose encoding → Qdrant search → Results
----
-## Project Structure
-```
-├── config.py              # Shared configuration
-├── ingest_qdrant.py       # ETL: TDC → DeepPurpose → Qdrant
-├── deeppurpose002.py      # Model training script
-├── bioflow/
-│   └── api/
-│       └── server.py      # FastAPI backend
-├── runs/
-│   └── 20260125_104915_KIBA/
-│       ├── model.pt       # Trained model weights
-│       └── config.pkl     # Model configuration
-├── ui/
-│   ├── app/
-│   │   ├── workflow/      # Visual Pipeline Builder
-│   │   ├── explorer/      # 3D Embedding Visualization
-│   │   ├── discovery/     # Drug Discovery Interface
-│   │   └── data/          # Data Browser
-│   └── components/
-└── data/
-    └── kiba.tab           # Cached TDC dataset
-```
----
-## API Endpoints
-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/health` | GET | Service health + model metrics |
-| `/api/points` | GET | Get 3D PCA points for visualization |
-| `/api/search` | POST | Similarity search by SMILES/sequence |
-### Example: Search Similar Compounds
-```bash
-curl -X POST "http://localhost:8001/api/search" \
-  -H "Content-Type: application/json" \
-  -d '{"smiles": "CC(=O)Nc1ccc(O)cc1", "top_k": 10}'
-```
----
-## Qdrant Integration Strategy
-### 1. Multimodal Bridge
-Using OpenBioMed for joint embeddings across proteins, molecules, and text — enabling **cross-modal retrieval**.
-### 2. Dynamic Workflow Memory
-Pipeline nodes store intermediate results in Qdrant collections, enabling agent-to-agent communication.
-### 3. High-Dimensional Scalability
-HNSW indexing handles bio-embeddings at scale, keeping similarity searches interactive and real-time.
-## Resources
-- [DeepPurpose](https://github.com/kexinhuang12345/DeepPurpose) — DTI Prediction Toolkit
-- [OpenBioMed](https://github.com/PharMolix/OpenBioMed) — Multimodal AI Framework
-- [Qdrant](https://qdrant.tech/) — Vector Database
-- [TDC](https://tdcommons.ai/) — Therapeutics Data Commons
----
-## License
-MIT License - See [LICENSE](LICENSE) for details.

+---
+title: BioFlow
+emoji: 🧬
+colorFrom: blue
+colorTo: green
+sdk: docker
+pinned: false
+---
+# BioFlow API
+FastAPI backend for BioFlow - Drug-Target Interaction (DTI) discovery platform.
+## Endpoints
+- `/api/health` - Health check
+- `/api/molecules` - Molecule search
+- `/api/proteins` - Protein search
+- `/api/points` - 3D visualization data
+- `/api/search` - Semantic search