bioflow / docs /ROADMAP.md
ramiiiiiiiiiiiiiiiiiiiiiiiiiiiiii's picture
Fix explorer/ingestion UI and 3D endpoints
673a52e
# πŸ—ΊοΈ BioFlow Orchestrator Development Roadmap
This roadmap outlines the collaborative development of a unified R&D platform for biological discovery using **fully open-source** tools and models.
---
## πŸ—οΈ Phase 1: Infrastructure & Core Framework βœ… COMPLETE
**Goal:** Establish the "modality-agnostic" foundation so tools can be plugged in without rewriting core logic.
- [x] **Core Abstractions** (`bioflow/core/base.py`):
- `BioEncoder`: Interface for vectorization (ESM-2, ChemBERTa, PubMedBERT, CLIP)
- `BioPredictor`: Interface for predictions (DeepPurpose, ADMET)
- `BioGenerator`: Interface for candidate generation
- `BioRetriever`: Interface for vector DB operations
- Data containers: `EmbeddingResult`, `PredictionResult`, `RetrievalResult`
- [x] **Tool Registry** (`bioflow/core/registry.py`):
- Central hub to manage multiple tools
- Register/unregister by name
- Default tool fallbacks
- Utility methods for listing and summary
- [x] **Configuration Schema** (`bioflow/core/config.py`):
- `NodeConfig`: Single pipeline node definition
- `WorkflowConfig`: Complete workflow definition
- `BioFlowConfig`: Master system configuration
- YAML-compatible dataclasses
- [x] **Stateful Pipeline Engine** (`bioflow/core/orchestrator.py`):
- `BioFlowOrchestrator`: DAG-based workflow execution
- Topological sort for dependency resolution
- `ExecutionContext` for state passing
- Custom handler support
- Error handling and traceability
- [x] **Sample Workflows** (`bioflow/workflows/`):
- `drug_discovery.yaml`: Encode β†’ Retrieve β†’ Predict β†’ Filter
- `literature_mining.yaml`: Cross-modal literature search
---
## πŸ§ͺ Phase 2: Parallel Tool Implementation βœ… COMPLETE
The team works on their respective modules using the core interfaces.
### **1. OBM Integration** βœ…
- [x] `OBMEncoder` - Unified multimodal encoder (`bioflow/plugins/obm_encoder.py`)
- [x] `TextEncoder` - PubMedBERT/SciBERT (`bioflow/plugins/encoders/text_encoder.py`)
- [x] `MoleculeEncoder` - ChemBERTa/RDKit (`bioflow/plugins/encoders/molecule_encoder.py`)
- [x] `ProteinEncoder` - ESM-2/ProtBERT (`bioflow/plugins/encoders/protein_encoder.py`)
- [x] Lazy loading for efficient memory usage
- [x] Dimension projection for cross-modal compatibility
### **2. Qdrant Retriever** βœ…
- [x] `QdrantRetriever` implements `BioRetriever` interface (`bioflow/plugins/qdrant_retriever.py`)
- [x] HNSW indexing with cosine/euclidean/dot distance
- [x] Payload filtering (species, experiment type, modality)
- [x] Batch ingestion support
- [x] In-memory, local, or remote Qdrant connections
### **3. DeepPurpose Predictor** βœ…
- [x] `DeepPurposePredictor` implements `BioPredictor` (`bioflow/plugins/deeppurpose_predictor.py`)
- [x] DTI prediction with Transformer+CNN architecture
- [x] Graceful fallback when DeepPurpose unavailable
- [x] Batch prediction support
---
## πŸ”— Phase 3: The Unified Workflow βœ… COMPLETE
**Goal:** Connect the tools into a coherent discovery loop.
- [x] **Typed Node System** (`bioflow/core/nodes.py`):
- `EncodeNode`: Vectorize inputs via BioEncoder
- `RetrieveNode`: Query vector DB for similar items
- `PredictNode`: Run DTI predictions on candidates
- `IngestNode`: Add new data to vector DB
- `FilterNode`: Score-based filtering and ranking
- `TraceabilityNode`: Link results to evidence sources
- [x] **Discovery Pipelines** (`bioflow/workflows/discovery.py`):
- `DiscoveryPipeline`: Full drug discovery workflow (encode β†’ retrieve β†’ predict β†’ filter β†’ trace)
- `LiteratureMiningPipeline`: Cross-modal literature search
- `ProteinDesignPipeline`: Protein homolog discovery
- Batch ingestion and simple search APIs
- [x] **Data Ingestion Utilities** (`bioflow/workflows/ingestion.py`):
- JSON/CSV file loaders
- SMILES/FASTA file parsers
- Sample data generators for testing
- [x] **Evidence Traceability**:
- Automatic PubMed/UniProt/PubChem/DrugBank link generation
- Metadata preservation through pipeline
**Verification:** `python scripts/verify_phase3.py` - All 5 tests pass βœ…
---
## πŸ“Š Phase 4: UI/UX & Deployment βœ… COMPLETE
**Goal:** Build an intuitive, modern interface for the BioFlow platform.
- [x] **Next.js Frontend** (`ui/`):
- Next.js 16 app router + Tailwind + shadcn/ui
- Dashboard pages: Discovery, 3D Visualization, Workflow Builder
- `/app/api/*` proxy routes to the FastAPI backend
- Optional mock fallbacks for molecules/proteins list routes
**Launch:**
- Full stack (Windows): `launch_bioflow_full.bat`
- Manual:
- Backend: `python -m uvicorn bioflow.api.server:app --host 0.0.0.0 --port 8000`
- UI: `cd ui && pnpm dev`
---
## πŸš€ Phase 5: Open-Source Alignment
- **Strict Open-Source Compliance**: remove proprietary integrations and keep only OSS models/tools.
- **Open Protein/Peptide Options**: integrate open models (e.g., ESM-2 / ProGen2) behind `BioGenerator`.
- **Open Retrieval + Evidence**: improve evidence traceability (PubMed/UniProt/ChEMBL) and evaluation.