Spaces:
Sleeping
πΊοΈ BioFlow Orchestrator Development Roadmap
This roadmap outlines the collaborative development of a unified R&D platform for biological discovery using fully open-source tools and models.
ποΈ Phase 1: Infrastructure & Core Framework β COMPLETE
Goal: Establish the "modality-agnostic" foundation so tools can be plugged in without rewriting core logic.
Core Abstractions (
bioflow/core/base.py):BioEncoder: Interface for vectorization (ESM-2, ChemBERTa, PubMedBERT, CLIP)BioPredictor: Interface for predictions (DeepPurpose, ADMET)BioGenerator: Interface for candidate generationBioRetriever: Interface for vector DB operations- Data containers:
EmbeddingResult,PredictionResult,RetrievalResult
Tool Registry (
bioflow/core/registry.py):- Central hub to manage multiple tools
- Register/unregister by name
- Default tool fallbacks
- Utility methods for listing and summary
Configuration Schema (
bioflow/core/config.py):NodeConfig: Single pipeline node definitionWorkflowConfig: Complete workflow definitionBioFlowConfig: Master system configuration- YAML-compatible dataclasses
Stateful Pipeline Engine (
bioflow/core/orchestrator.py):BioFlowOrchestrator: DAG-based workflow execution- Topological sort for dependency resolution
ExecutionContextfor state passing- Custom handler support
- Error handling and traceability
Sample Workflows (
bioflow/workflows/):drug_discovery.yaml: Encode β Retrieve β Predict β Filterliterature_mining.yaml: Cross-modal literature search
π§ͺ Phase 2: Parallel Tool Implementation β COMPLETE
The team works on their respective modules using the core interfaces.
1. OBM Integration β
-
OBMEncoder- Unified multimodal encoder (bioflow/plugins/obm_encoder.py) -
TextEncoder- PubMedBERT/SciBERT (bioflow/plugins/encoders/text_encoder.py) -
MoleculeEncoder- ChemBERTa/RDKit (bioflow/plugins/encoders/molecule_encoder.py) -
ProteinEncoder- ESM-2/ProtBERT (bioflow/plugins/encoders/protein_encoder.py) - Lazy loading for efficient memory usage
- Dimension projection for cross-modal compatibility
2. Qdrant Retriever β
-
QdrantRetrieverimplementsBioRetrieverinterface (bioflow/plugins/qdrant_retriever.py) - HNSW indexing with cosine/euclidean/dot distance
- Payload filtering (species, experiment type, modality)
- Batch ingestion support
- In-memory, local, or remote Qdrant connections
3. DeepPurpose Predictor β
-
DeepPurposePredictorimplementsBioPredictor(bioflow/plugins/deeppurpose_predictor.py) - DTI prediction with Transformer+CNN architecture
- Graceful fallback when DeepPurpose unavailable
- Batch prediction support
π Phase 3: The Unified Workflow β COMPLETE
Goal: Connect the tools into a coherent discovery loop.
Typed Node System (
bioflow/core/nodes.py):EncodeNode: Vectorize inputs via BioEncoderRetrieveNode: Query vector DB for similar itemsPredictNode: Run DTI predictions on candidatesIngestNode: Add new data to vector DBFilterNode: Score-based filtering and rankingTraceabilityNode: Link results to evidence sources
Discovery Pipelines (
bioflow/workflows/discovery.py):DiscoveryPipeline: Full drug discovery workflow (encode β retrieve β predict β filter β trace)LiteratureMiningPipeline: Cross-modal literature searchProteinDesignPipeline: Protein homolog discovery- Batch ingestion and simple search APIs
Data Ingestion Utilities (
bioflow/workflows/ingestion.py):- JSON/CSV file loaders
- SMILES/FASTA file parsers
- Sample data generators for testing
Evidence Traceability:
- Automatic PubMed/UniProt/PubChem/DrugBank link generation
- Metadata preservation through pipeline
Verification: python scripts/verify_phase3.py - All 5 tests pass β
π Phase 4: UI/UX & Deployment β COMPLETE
Goal: Build an intuitive, modern interface for the BioFlow platform.
- Next.js Frontend (
ui/):- Next.js 16 app router + Tailwind + shadcn/ui
- Dashboard pages: Discovery, 3D Visualization, Workflow Builder
/app/api/*proxy routes to the FastAPI backend- Optional mock fallbacks for molecules/proteins list routes
Launch:
- Full stack (Windows):
launch_bioflow_full.bat - Manual:
- Backend:
python -m uvicorn bioflow.api.server:app --host 0.0.0.0 --port 8000 - UI:
cd ui && pnpm dev
- Backend:
π Phase 5: Open-Source Alignment
- Strict Open-Source Compliance: remove proprietary integrations and keep only OSS models/tools.
- Open Protein/Peptide Options: integrate open models (e.g., ESM-2 / ProGen2) behind
BioGenerator. - Open Retrieval + Evidence: improve evidence traceability (PubMed/UniProt/ChEMBL) and evaluation.