bioflow / docs /ROADMAP.md
ramiiiiiiiiiiiiiiiiiiiiiiiiiiiiii's picture
Fix explorer/ingestion UI and 3D endpoints
673a52e

πŸ—ΊοΈ BioFlow Orchestrator Development Roadmap

This roadmap outlines the collaborative development of a unified R&D platform for biological discovery using fully open-source tools and models.


πŸ—οΈ Phase 1: Infrastructure & Core Framework βœ… COMPLETE

Goal: Establish the "modality-agnostic" foundation so tools can be plugged in without rewriting core logic.

  • Core Abstractions (bioflow/core/base.py):

    • BioEncoder: Interface for vectorization (ESM-2, ChemBERTa, PubMedBERT, CLIP)
    • BioPredictor: Interface for predictions (DeepPurpose, ADMET)
    • BioGenerator: Interface for candidate generation
    • BioRetriever: Interface for vector DB operations
    • Data containers: EmbeddingResult, PredictionResult, RetrievalResult
  • Tool Registry (bioflow/core/registry.py):

    • Central hub to manage multiple tools
    • Register/unregister by name
    • Default tool fallbacks
    • Utility methods for listing and summary
  • Configuration Schema (bioflow/core/config.py):

    • NodeConfig: Single pipeline node definition
    • WorkflowConfig: Complete workflow definition
    • BioFlowConfig: Master system configuration
    • YAML-compatible dataclasses
  • Stateful Pipeline Engine (bioflow/core/orchestrator.py):

    • BioFlowOrchestrator: DAG-based workflow execution
    • Topological sort for dependency resolution
    • ExecutionContext for state passing
    • Custom handler support
    • Error handling and traceability
  • Sample Workflows (bioflow/workflows/):

    • drug_discovery.yaml: Encode β†’ Retrieve β†’ Predict β†’ Filter
    • literature_mining.yaml: Cross-modal literature search

πŸ§ͺ Phase 2: Parallel Tool Implementation βœ… COMPLETE

The team works on their respective modules using the core interfaces.

1. OBM Integration βœ…

  • OBMEncoder - Unified multimodal encoder (bioflow/plugins/obm_encoder.py)
  • TextEncoder - PubMedBERT/SciBERT (bioflow/plugins/encoders/text_encoder.py)
  • MoleculeEncoder - ChemBERTa/RDKit (bioflow/plugins/encoders/molecule_encoder.py)
  • ProteinEncoder - ESM-2/ProtBERT (bioflow/plugins/encoders/protein_encoder.py)
  • Lazy loading for efficient memory usage
  • Dimension projection for cross-modal compatibility

2. Qdrant Retriever βœ…

  • QdrantRetriever implements BioRetriever interface (bioflow/plugins/qdrant_retriever.py)
  • HNSW indexing with cosine/euclidean/dot distance
  • Payload filtering (species, experiment type, modality)
  • Batch ingestion support
  • In-memory, local, or remote Qdrant connections

3. DeepPurpose Predictor βœ…

  • DeepPurposePredictor implements BioPredictor (bioflow/plugins/deeppurpose_predictor.py)
  • DTI prediction with Transformer+CNN architecture
  • Graceful fallback when DeepPurpose unavailable
  • Batch prediction support

πŸ”— Phase 3: The Unified Workflow βœ… COMPLETE

Goal: Connect the tools into a coherent discovery loop.

  • Typed Node System (bioflow/core/nodes.py):

    • EncodeNode: Vectorize inputs via BioEncoder
    • RetrieveNode: Query vector DB for similar items
    • PredictNode: Run DTI predictions on candidates
    • IngestNode: Add new data to vector DB
    • FilterNode: Score-based filtering and ranking
    • TraceabilityNode: Link results to evidence sources
  • Discovery Pipelines (bioflow/workflows/discovery.py):

    • DiscoveryPipeline: Full drug discovery workflow (encode β†’ retrieve β†’ predict β†’ filter β†’ trace)
    • LiteratureMiningPipeline: Cross-modal literature search
    • ProteinDesignPipeline: Protein homolog discovery
    • Batch ingestion and simple search APIs
  • Data Ingestion Utilities (bioflow/workflows/ingestion.py):

    • JSON/CSV file loaders
    • SMILES/FASTA file parsers
    • Sample data generators for testing
  • Evidence Traceability:

    • Automatic PubMed/UniProt/PubChem/DrugBank link generation
    • Metadata preservation through pipeline

Verification: python scripts/verify_phase3.py - All 5 tests pass βœ…


πŸ“Š Phase 4: UI/UX & Deployment βœ… COMPLETE

Goal: Build an intuitive, modern interface for the BioFlow platform.

  • Next.js Frontend (ui/):
    • Next.js 16 app router + Tailwind + shadcn/ui
    • Dashboard pages: Discovery, 3D Visualization, Workflow Builder
    • /app/api/* proxy routes to the FastAPI backend
    • Optional mock fallbacks for molecules/proteins list routes

Launch:

  • Full stack (Windows): launch_bioflow_full.bat
  • Manual:
    • Backend: python -m uvicorn bioflow.api.server:app --host 0.0.0.0 --port 8000
    • UI: cd ui && pnpm dev

πŸš€ Phase 5: Open-Source Alignment

  • Strict Open-Source Compliance: remove proprietary integrations and keep only OSS models/tools.
  • Open Protein/Peptide Options: integrate open models (e.g., ESM-2 / ProGen2) behind BioGenerator.
  • Open Retrieval + Evidence: improve evidence traceability (PubMed/UniProt/ChEMBL) and evaluation.