Spaces:
Sleeping
Sleeping
File size: 5,040 Bytes
adecc9b 673a52e adecc9b 673a52e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
# πΊοΈ BioFlow Orchestrator Development Roadmap
This roadmap outlines the collaborative development of a unified R&D platform for biological discovery using **fully open-source** tools and models.
---
## ποΈ Phase 1: Infrastructure & Core Framework β
COMPLETE
**Goal:** Establish the "modality-agnostic" foundation so tools can be plugged in without rewriting core logic.
- [x] **Core Abstractions** (`bioflow/core/base.py`):
- `BioEncoder`: Interface for vectorization (ESM-2, ChemBERTa, PubMedBERT, CLIP)
- `BioPredictor`: Interface for predictions (DeepPurpose, ADMET)
- `BioGenerator`: Interface for candidate generation
- `BioRetriever`: Interface for vector DB operations
- Data containers: `EmbeddingResult`, `PredictionResult`, `RetrievalResult`
- [x] **Tool Registry** (`bioflow/core/registry.py`):
- Central hub to manage multiple tools
- Register/unregister by name
- Default tool fallbacks
- Utility methods for listing and summary
- [x] **Configuration Schema** (`bioflow/core/config.py`):
- `NodeConfig`: Single pipeline node definition
- `WorkflowConfig`: Complete workflow definition
- `BioFlowConfig`: Master system configuration
- YAML-compatible dataclasses
- [x] **Stateful Pipeline Engine** (`bioflow/core/orchestrator.py`):
- `BioFlowOrchestrator`: DAG-based workflow execution
- Topological sort for dependency resolution
- `ExecutionContext` for state passing
- Custom handler support
- Error handling and traceability
- [x] **Sample Workflows** (`bioflow/workflows/`):
- `drug_discovery.yaml`: Encode β Retrieve β Predict β Filter
- `literature_mining.yaml`: Cross-modal literature search
---
## π§ͺ Phase 2: Parallel Tool Implementation β
COMPLETE
The team works on their respective modules using the core interfaces.
### **1. OBM Integration** β
- [x] `OBMEncoder` - Unified multimodal encoder (`bioflow/plugins/obm_encoder.py`)
- [x] `TextEncoder` - PubMedBERT/SciBERT (`bioflow/plugins/encoders/text_encoder.py`)
- [x] `MoleculeEncoder` - ChemBERTa/RDKit (`bioflow/plugins/encoders/molecule_encoder.py`)
- [x] `ProteinEncoder` - ESM-2/ProtBERT (`bioflow/plugins/encoders/protein_encoder.py`)
- [x] Lazy loading for efficient memory usage
- [x] Dimension projection for cross-modal compatibility
### **2. Qdrant Retriever** β
- [x] `QdrantRetriever` implements `BioRetriever` interface (`bioflow/plugins/qdrant_retriever.py`)
- [x] HNSW indexing with cosine/euclidean/dot distance
- [x] Payload filtering (species, experiment type, modality)
- [x] Batch ingestion support
- [x] In-memory, local, or remote Qdrant connections
### **3. DeepPurpose Predictor** β
- [x] `DeepPurposePredictor` implements `BioPredictor` (`bioflow/plugins/deeppurpose_predictor.py`)
- [x] DTI prediction with Transformer+CNN architecture
- [x] Graceful fallback when DeepPurpose unavailable
- [x] Batch prediction support
---
## π Phase 3: The Unified Workflow β
COMPLETE
**Goal:** Connect the tools into a coherent discovery loop.
- [x] **Typed Node System** (`bioflow/core/nodes.py`):
- `EncodeNode`: Vectorize inputs via BioEncoder
- `RetrieveNode`: Query vector DB for similar items
- `PredictNode`: Run DTI predictions on candidates
- `IngestNode`: Add new data to vector DB
- `FilterNode`: Score-based filtering and ranking
- `TraceabilityNode`: Link results to evidence sources
- [x] **Discovery Pipelines** (`bioflow/workflows/discovery.py`):
- `DiscoveryPipeline`: Full drug discovery workflow (encode β retrieve β predict β filter β trace)
- `LiteratureMiningPipeline`: Cross-modal literature search
- `ProteinDesignPipeline`: Protein homolog discovery
- Batch ingestion and simple search APIs
- [x] **Data Ingestion Utilities** (`bioflow/workflows/ingestion.py`):
- JSON/CSV file loaders
- SMILES/FASTA file parsers
- Sample data generators for testing
- [x] **Evidence Traceability**:
- Automatic PubMed/UniProt/PubChem/DrugBank link generation
- Metadata preservation through pipeline
**Verification:** `python scripts/verify_phase3.py` - All 5 tests pass β
---
## π Phase 4: UI/UX & Deployment β
COMPLETE
**Goal:** Build an intuitive, modern interface for the BioFlow platform.
- [x] **Next.js Frontend** (`ui/`):
- Next.js 16 app router + Tailwind + shadcn/ui
- Dashboard pages: Discovery, 3D Visualization, Workflow Builder
- `/app/api/*` proxy routes to the FastAPI backend
- Optional mock fallbacks for molecules/proteins list routes
**Launch:**
- Full stack (Windows): `launch_bioflow_full.bat`
- Manual:
- Backend: `python -m uvicorn bioflow.api.server:app --host 0.0.0.0 --port 8000`
- UI: `cd ui && pnpm dev`
---
## π Phase 5: Open-Source Alignment
- **Strict Open-Source Compliance**: remove proprietary integrations and keep only OSS models/tools.
- **Open Protein/Peptide Options**: integrate open models (e.g., ESM-2 / ProGen2) behind `BioGenerator`.
- **Open Retrieval + Evidence**: improve evidence traceability (PubMed/UniProt/ChEMBL) and evaluation.
|