Spaces:

vignt97867896
/

bioflow

Sleeping

bioflow / docs /ROADMAP.md

Fix explorer/ingestion UI and 3D endpoints

673a52e 11 days ago

5.04 kB

🗺️ BioFlow Orchestrator Development Roadmap

This roadmap outlines the collaborative development of a unified R&D platform for biological discovery using fully open-source tools and models.

Goal: Establish the "modality-agnostic" foundation so tools can be plugged in without rewriting core logic.

Core Abstractions (bioflow/core/base.py):
- BioEncoder: Interface for vectorization (ESM-2, ChemBERTa, PubMedBERT, CLIP)
- BioPredictor: Interface for predictions (DeepPurpose, ADMET)
- BioGenerator: Interface for candidate generation
- BioRetriever: Interface for vector DB operations
- Data containers: EmbeddingResult, PredictionResult, RetrievalResult
Tool Registry (bioflow/core/registry.py):
- Central hub to manage multiple tools
- Register/unregister by name
- Default tool fallbacks
- Utility methods for listing and summary
Configuration Schema (bioflow/core/config.py):
- NodeConfig: Single pipeline node definition
- WorkflowConfig: Complete workflow definition
- BioFlowConfig: Master system configuration
- YAML-compatible dataclasses
Stateful Pipeline Engine (bioflow/core/orchestrator.py):
- BioFlowOrchestrator: DAG-based workflow execution
- Topological sort for dependency resolution
- ExecutionContext for state passing
- Custom handler support
- Error handling and traceability
Sample Workflows (bioflow/workflows/):
- drug_discovery.yaml: Encode → Retrieve → Predict → Filter
- literature_mining.yaml: Cross-modal literature search

The team works on their respective modules using the core interfaces.

OBMEncoder - Unified multimodal encoder (bioflow/plugins/obm_encoder.py)
TextEncoder - PubMedBERT/SciBERT (bioflow/plugins/encoders/text_encoder.py)
MoleculeEncoder - ChemBERTa/RDKit (bioflow/plugins/encoders/molecule_encoder.py)
ProteinEncoder - ESM-2/ProtBERT (bioflow/plugins/encoders/protein_encoder.py)
Lazy loading for efficient memory usage
Dimension projection for cross-modal compatibility

QdrantRetriever implements BioRetriever interface (bioflow/plugins/qdrant_retriever.py)
HNSW indexing with cosine/euclidean/dot distance
Payload filtering (species, experiment type, modality)
Batch ingestion support
In-memory, local, or remote Qdrant connections

DeepPurposePredictor implements BioPredictor (bioflow/plugins/deeppurpose_predictor.py)
DTI prediction with Transformer+CNN architecture
Graceful fallback when DeepPurpose unavailable
Batch prediction support

Goal: Connect the tools into a coherent discovery loop.

Typed Node System (bioflow/core/nodes.py):
- EncodeNode: Vectorize inputs via BioEncoder
- RetrieveNode: Query vector DB for similar items
- PredictNode: Run DTI predictions on candidates
- IngestNode: Add new data to vector DB
- FilterNode: Score-based filtering and ranking
- TraceabilityNode: Link results to evidence sources
Discovery Pipelines (bioflow/workflows/discovery.py):
- DiscoveryPipeline: Full drug discovery workflow (encode → retrieve → predict → filter → trace)
- LiteratureMiningPipeline: Cross-modal literature search
- ProteinDesignPipeline: Protein homolog discovery
- Batch ingestion and simple search APIs
Data Ingestion Utilities (bioflow/workflows/ingestion.py):
- JSON/CSV file loaders
- SMILES/FASTA file parsers
- Sample data generators for testing
Evidence Traceability:
- Automatic PubMed/UniProt/PubChem/DrugBank link generation
- Metadata preservation through pipeline

Verification: python scripts/verify_phase3.py - All 5 tests pass ✅

Goal: Build an intuitive, modern interface for the BioFlow platform.

Next.js Frontend (ui/):
- Next.js 16 app router + Tailwind + shadcn/ui
- Dashboard pages: Discovery, 3D Visualization, Workflow Builder
- /app/api/* proxy routes to the FastAPI backend
- Optional mock fallbacks for molecules/proteins list routes

Launch:

Full stack (Windows): launch_bioflow_full.bat
Manual:
- Backend: python -m uvicorn bioflow.api.server:app --host 0.0.0.0 --port 8000
- UI: cd ui && pnpm dev

Strict Open-Source Compliance: remove proprietary integrations and keep only OSS models/tools.
Open Protein/Peptide Options: integrate open models (e.g., ESM-2 / ProGen2) behind BioGenerator.
Open Retrieval + Evidence: improve evidence traceability (PubMed/UniProt/ChEMBL) and evaluation.