vignt97867896 commited on
Commit
fbcbc07
·
verified ·
1 Parent(s): 54592ce

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +20 -236
README.md CHANGED
@@ -1,236 +1,20 @@
1
- # BioFlow
2
-
3
- > **Multimodal Biological Design & Discovery Intelligence Engine**
4
- > A low-code workflow platform for unified biological discovery pipelines
5
-
6
- ![Python](https://img.shields.io/badge/Python-3.10-blue)
7
- ![Next.js](https://img.shields.io/badge/Next.js-16-black)
8
- ![Qdrant](https://img.shields.io/badge/Qdrant-Vector_DB-red)
9
- ![CUDA](https://img.shields.io/badge/CUDA-11.8-green)
10
- ![Team](https://img.shields.io/badge/Team-Lacoste-purple)
11
-
12
- ---
13
-
14
- ## Problem Statement
15
-
16
- Biological R&D knowledge is fragmented across disconnected silos:
17
- - **Textual literature** (papers, lab notes)
18
- - **3D structural data** (PDB files)
19
- - **Chemical sequences** (SMILES)
20
-
21
- Researchers must manually navigate incompatible formats, creating bottlenecks and "blind spots" where critical connections are missed.
22
-
23
- ## Our Solution
24
-
25
- **BioFlow** is a visual workflow engine that unifies biological discovery pipelines. Rather than a single "black box" model, we function as an **intelligent platform** — allowing researchers to chain state-of-the-art open-source biological models into coherent discovery workflows.
26
-
27
- ### Key Features
28
-
29
- | Feature | Description |
30
- |---------|-------------|
31
- | **Visual Pipeline Builder** | Drag-and-drop node editor for constructing discovery workflows |
32
- | **DeepPurpose Integration** | Drug-Target Interaction prediction with Morgan + CNN encoding |
33
- | **Molecule & Protein Visualization** | Interactive 2D SMILES and 3D PDB structure viewing (powered by 3Dmol.js and SmilesDrawer) |
34
- | **Qdrant Vector Search** | High-dimensional similarity search across 23,531+ compounds |
35
- | **3D Embedding Explorer** | Real PCA projections of drug-target chemical space |
36
- | **Validator Agents** | Automated toxicity and novelty checking |
37
-
38
- ---
39
-
40
- ## Architecture
41
-
42
- ```
43
- ┌──────────────────────────────────────────┐
44
- │ BioFlow │
45
- │ Visual Pipeline Builder (UI) │
46
- └─────────────────┬────────────────────────┘
47
-
48
- ┌─────────────────────────────────┼─────────────────────────────────┐
49
- │ │ │
50
- ▼ ▼ ▼
51
- ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
52
- │ Data Input │ │ DeepPurpose │ │ OpenBioMed │
53
- │ SMILES/Protein │────────────▶│ DTI Model │────────────▶│ Multimodal │
54
- │ Sequences │ │ Morgan + CNN │ │ Embeddings │
55
- └─────────────────┘ └────────┬────────┘ └────────┬────────┘
56
- │ │
57
- └───────────────┬───────────────┘
58
-
59
-
60
- ┌─────────────────┐
61
- │ Qdrant │
62
- │ Vector DB │
63
- │ HNSW Indexing │
64
- │ 23,531 vectors │
65
- └────────┬────────┘
66
-
67
- ┌─────────────────────────────┼─────────────────────────────┐
68
- │ │ │
69
- ▼ ▼ ▼
70
- ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
71
- │ Similarity │ │ Validator │ │ Results │
72
- │ Search Agent │ │ Agent │ │ Output │
73
- │ Top-K Retrieval │ │ Toxicity/Novelty│ │ Candidates │
74
- └─────────────────┘ └─────────────────┘ └─────────────────┘
75
- ```
76
-
77
- ---
78
-
79
- ## Model Performance
80
-
81
- | Dataset | Concordance Index | Pearson | MSE |
82
- |---------|-------------------|---------|-----|
83
- | **KIBA** | 0.7003 | 0.5219 | 0.0008 |
84
- | **BindingDB_Kd** | 0.8083 | 0.7679 | 0.6668 |
85
- | **DAVIS** | 0.7914 | 0.5446 | 0.4684 |
86
-
87
- ---
88
-
89
- ## Quick Start
90
-
91
- ### Prerequisites
92
- - Python 3.10+
93
- - Node.js 18+
94
- - Docker Desktop
95
- - CUDA 11.8 (optional, for GPU acceleration)
96
-
97
- ### 1. Clone & Setup
98
- ```bash
99
- git clone https://github.com/hamzasammoud11-dotcom/lacoste001.git
100
- cd lacoste001
101
-
102
- # Python environment
103
- python -m venv .venv
104
- .venv\Scripts\activate # Windows
105
- pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
106
- pip install DeepPurpose qdrant-client fastapi uvicorn scikit-learn
107
- ```
108
-
109
- ### 2. Start Qdrant Vector Database
110
- ```bash
111
- docker run -d --name qdrant -p 6333:6333 -p 6334:6334 qdrant/qdrant:latest
112
- ```
113
-
114
- ### 3. Ingest Data (One-time)
115
- ```bash
116
- python ingest_qdrant.py
117
- # Loads KIBA dataset → DeepPurpose embeddings → Qdrant
118
- # ~23,531 drug-target pairs indexed
119
- ```
120
-
121
- ### 4. Start Backend API
122
- ```bash
123
- python -m uvicorn bioflow.api.server:app --host 0.0.0.0 --port 8001
124
- ```
125
-
126
- ### 5. Start Frontend
127
- ```bash
128
- cd ui
129
- pnpm install
130
- pnpm dev
131
- # Open http://localhost:3000
132
- ```
133
-
134
- ### 6. Start Langflow (Visual Workflow Builder)
135
- ```bash
136
- # You can use the provided script
137
- ./run_langflow.bat
138
-
139
- # Or manually:
140
- pip install langflow
141
- langflow run --host 0.0.0.0 --port 7860
142
- # Access via http://localhost:3000/workflow (embedded)
143
- # Or directly at http://localhost:7860
144
- ```
145
-
146
- ---
147
-
148
- ## Visual Workflow Builder (Langflow Integration)
149
-
150
- BioFlow integrates **Langflow** as the visual workflow engine, providing a full-screen drag-and-drop pipeline builder accessible from `/workflow`.
151
-
152
- ### Building a DTI Pipeline in Langflow
153
-
154
- 1. **Import the Template Flow**:
155
- - Open Langflow (`/workflow` or `localhost:7860`)
156
- - Click "New Project" → "Import"
157
- - Load `langflow/bioflow_dti_pipeline.json`
158
-
159
- 2. **Configure the Pipeline**:
160
- - **Drug Input**: Enter SMILES string (e.g., `CC(=O)Nc1ccc(O)cc1`)
161
- - **Target Input**: Enter protein sequence
162
- - **API Nodes**: Point to `http://localhost:8001/api/*`
163
-
164
- 3. **Run the Flow**:
165
- - Click "Run" to execute DeepPurpose encoding → Qdrant search → Results
166
-
167
- ---
168
-
169
- ## Project Structure
170
-
171
- ```
172
- ├── config.py # Shared configuration
173
- ├── ingest_qdrant.py # ETL: TDC → DeepPurpose → Qdrant
174
- ├── deeppurpose002.py # Model training script
175
- ├── bioflow/
176
- │ └── api/
177
- │ └── server.py # FastAPI backend
178
- ├── runs/
179
- │ └── 20260125_104915_KIBA/
180
- │ ├── model.pt # Trained model weights
181
- │ └── config.pkl # Model configuration
182
- ├── ui/
183
- │ ├── app/
184
- │ │ ├── workflow/ # Visual Pipeline Builder
185
- │ │ ├── explorer/ # 3D Embedding Visualization
186
- │ │ ├── discovery/ # Drug Discovery Interface
187
- │ │ └── data/ # Data Browser
188
- │ └── components/
189
- └── data/
190
- └── kiba.tab # Cached TDC dataset
191
- ```
192
-
193
- ---
194
-
195
- ## API Endpoints
196
-
197
- | Endpoint | Method | Description |
198
- |----------|--------|-------------|
199
- | `/health` | GET | Service health + model metrics |
200
- | `/api/points` | GET | Get 3D PCA points for visualization |
201
- | `/api/search` | POST | Similarity search by SMILES/sequence |
202
-
203
- ### Example: Search Similar Compounds
204
- ```bash
205
- curl -X POST "http://localhost:8001/api/search" \
206
- -H "Content-Type: application/json" \
207
- -d '{"smiles": "CC(=O)Nc1ccc(O)cc1", "top_k": 10}'
208
- ```
209
-
210
- ---
211
-
212
- ## Qdrant Integration Strategy
213
-
214
- ### 1. Multimodal Bridge
215
- Using OpenBioMed for joint embeddings across proteins, molecules, and text — enabling **cross-modal retrieval**.
216
-
217
- ### 2. Dynamic Workflow Memory
218
- Pipeline nodes store intermediate results in Qdrant collections, enabling agent-to-agent communication.
219
-
220
- ### 3. High-Dimensional Scalability
221
- HNSW indexing handles bio-embeddings at scale, keeping similarity searches interactive and real-time.
222
-
223
-
224
-
225
- ## Resources
226
-
227
- - [DeepPurpose](https://github.com/kexinhuang12345/DeepPurpose) — DTI Prediction Toolkit
228
- - [OpenBioMed](https://github.com/PharMolix/OpenBioMed) — Multimodal AI Framework
229
- - [Qdrant](https://qdrant.tech/) — Vector Database
230
- - [TDC](https://tdcommons.ai/) — Therapeutics Data Commons
231
-
232
- ---
233
-
234
- ## License
235
-
236
- MIT License - See [LICENSE](LICENSE) for details.
 
1
+ ---
2
+ title: BioFlow
3
+ emoji: 🧬
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: docker
7
+ pinned: false
8
+ ---
9
+
10
+ # BioFlow API
11
+
12
+ FastAPI backend for BioFlow - Drug-Target Interaction (DTI) discovery platform.
13
+
14
+ ## Endpoints
15
+
16
+ - `/api/health` - Health check
17
+ - `/api/molecules` - Molecule search
18
+ - `/api/proteins` - Protein search
19
+ - `/api/points` - 3D visualization data
20
+ - `/api/search` - Semantic search