# Mosaic Architecture This document describes the internal architecture and module organization of the Mosaic application. ## Overview Mosaic is a deep learning pipeline for analyzing H&E whole slide images (WSIs) to predict: 1. **Cancer Subtypes** using the Aeon model 2. **Biomarkers** using the Paladin model The application is organized into several focused modules with clear separation of concerns. ## Module Structure The Mosaic application has been refactored for better readability and maintainability. The codebase is now organized into the following modules: ### Core Modules #### `mosaic.gradio_app` (Main Entry Point) - **Location**: `src/mosaic/gradio_app.py` - **Purpose**: CLI entry point and command-line argument parsing - **Responsibilities**: - Command-line argument parsing - Model downloading and initialization - Single slide and batch processing CLI modes - Launching the Gradio web UI #### `mosaic.analysis` - **Location**: `src/mosaic/analysis.py` - **Purpose**: Core slide analysis logic - **Responsibilities**: - Tissue segmentation - Feature extraction (CTransPath and Optimus) - Feature filtering with marker classifier - Aeon inference (cancer subtype prediction) - Paladin inference (biomarker prediction) - **Key Function**: `analyze_slide()` #### `mosaic.ui` Package - **Location**: `src/mosaic/ui/` - **Purpose**: Gradio web interface components - **Submodules**: - **`ui.__init__.py`**: Exports the main `launch_gradio` function - **`ui.app`**: Gradio interface definition - UI layout and component definitions - Event handlers for user interactions - Multi-slide analysis workflow - Key Functions: `launch_gradio()`, `analyze_slides()`, `set_cancer_subtype_maps()` - **`ui.utils`**: UI utility functions - Settings validation - CSV file handling - OncoTree API integration - User session directory management - Key Functions: `validate_settings()`, `load_settings()`, `get_oncotree_code_name()`, `create_user_directory()` ### Inference Modules #### `mosaic.inference` - **Location**: `src/mosaic/inference/` - **Purpose**: ML model inference implementations - **Submodules**: - `aeon.py`: Cancer subtype inference - `paladin.py`: Biomarker inference - `data.py`: Data structures and utilities ## Code Organization Benefits 1. **Separation of Concerns**: UI, analysis, and CLI logic are now clearly separated 2. **Improved Maintainability**: Each module has a single, well-defined responsibility 3. **Better Testability**: Individual modules can be tested independently 4. **Enhanced Readability**: Reduced file sizes and clear module boundaries 5. **Reusability**: Analysis functions can be imported and used without UI dependencies ## Import Flow ``` gradio_app.main() ├── download_and_process_models() │ ├── set_cancer_subtype_maps() [from ui.app] │ └── get_oncotree_code_name() [from ui.utils] ├── analyze_slide() [from analysis] │ ├── segment_tissue() [from mussel] │ ├── get_features() [from mussel] │ ├── filter_features() [from mussel] │ ├── run_aeon() [from inference] │ └── run_paladin() [from inference] └── launch_gradio() [from ui] ├── analyze_slides() [from ui.app] │ └── analyze_slide() [from analysis] └── validate_settings() [from ui.utils] ``` ## File Size Comparison File | Original | Refactored | Change -----|----------|------------|-------- `gradio_app.py` | 843 lines | 230 lines | -73% UI Components | - | 474 lines | +474 Analysis Logic | - | 200 lines | +200 The refactoring distributed the original monolithic file into focused, maintainable modules while maintaining all functionality. ## Key Dependencies ### External Libraries - **Gradio**: Web interface framework for creating the UI - **PyTorch**: Deep learning framework for model inference - **Pandas**: Data manipulation and CSV handling - **Mussel**: Pathology-specific utilities for: - Tissue segmentation - Feature extraction (CTransPath, Optimus) - Marker classification - **Paladin**: Biomarker prediction models - **HuggingFace Hub**: Model downloading and management - **Loguru**: Logging with enhanced features ### Model Components 1. **CTransPath**: Pre-trained vision transformer for histopathology feature extraction 2. **Optimus**: Foundation model for pathology image features 3. **Marker Classifier**: Filters features to tumor-relevant regions 4. **Aeon**: Multi-task model for cancer subtype classification 5. **Paladin**: Suite of models for biomarker prediction across cancer subtypes ## Data Flow ``` WSI File (*.svs, *.tif) ↓ Tissue Segmentation (Mussel) ↓ CTransPath Feature Extraction ↓ Marker Classification (filter to tumor regions) ↓ Optimus Feature Extraction (on filtered tiles) ↓ ├── Aeon Inference → Cancer Subtype Predictions │ ↓ └── Paladin Inference → Biomarker Predictions ↓ Results (CSV, Visualizations) ``` ## Design Principles 1. **Modularity**: Each component has a single, well-defined responsibility 2. **Testability**: Modules can be tested independently with mocking 3. **Reusability**: Core analysis functions can be used without UI 4. **Maintainability**: Clear interfaces and documentation 5. **Extensibility**: New models or features can be added with minimal changes ## Future Enhancements Potential areas for extension: - Support for additional image formats - Real-time analysis progress tracking - Integration with PACS systems - Support for additional biomarkers - Batch processing optimization - Cloud deployment configurations