Spaces:
Sleeping
Sleeping
| # Mosaic Architecture | |
| This document describes the internal architecture and module organization of the Mosaic application. | |
| ## Overview | |
| Mosaic is a deep learning pipeline for analyzing H&E whole slide images (WSIs) to predict: | |
| 1. **Cancer Subtypes** using the Aeon model | |
| 2. **Biomarkers** using the Paladin model | |
| The application is organized into several focused modules with clear separation of concerns. | |
| ## Module Structure | |
| The Mosaic application has been refactored for better readability and maintainability. The codebase is now organized into the following modules: | |
| ### Core Modules | |
| #### `mosaic.gradio_app` (Main Entry Point) | |
| - **Location**: `src/mosaic/gradio_app.py` | |
| - **Purpose**: CLI entry point and command-line argument parsing | |
| - **Responsibilities**: | |
| - Command-line argument parsing | |
| - Model downloading and initialization | |
| - Single slide and batch processing CLI modes | |
| - Launching the Gradio web UI | |
| #### `mosaic.analysis` | |
| - **Location**: `src/mosaic/analysis.py` | |
| - **Purpose**: Core slide analysis logic | |
| - **Responsibilities**: | |
| - Tissue segmentation | |
| - Feature extraction (CTransPath and Optimus) | |
| - Feature filtering with marker classifier | |
| - Aeon inference (cancer subtype prediction) | |
| - Paladin inference (biomarker prediction) | |
| - **Key Function**: `analyze_slide()` | |
| #### `mosaic.ui` Package | |
| - **Location**: `src/mosaic/ui/` | |
| - **Purpose**: Gradio web interface components | |
| - **Submodules**: | |
| - **`ui.__init__.py`**: Exports the main `launch_gradio` function | |
| - **`ui.app`**: Gradio interface definition | |
| - UI layout and component definitions | |
| - Event handlers for user interactions | |
| - Multi-slide analysis workflow | |
| - Key Functions: `launch_gradio()`, `analyze_slides()`, `set_cancer_subtype_maps()` | |
| - **`ui.utils`**: UI utility functions | |
| - Settings validation | |
| - CSV file handling | |
| - OncoTree API integration | |
| - User session directory management | |
| - Key Functions: `validate_settings()`, `load_settings()`, `get_oncotree_code_name()`, `create_user_directory()` | |
| ### Inference Modules | |
| #### `mosaic.inference` | |
| - **Location**: `src/mosaic/inference/` | |
| - **Purpose**: ML model inference implementations | |
| - **Submodules**: | |
| - `aeon.py`: Cancer subtype inference | |
| - `paladin.py`: Biomarker inference | |
| - `data.py`: Data structures and utilities | |
| ## Code Organization Benefits | |
| 1. **Separation of Concerns**: UI, analysis, and CLI logic are now clearly separated | |
| 2. **Improved Maintainability**: Each module has a single, well-defined responsibility | |
| 3. **Better Testability**: Individual modules can be tested independently | |
| 4. **Enhanced Readability**: Reduced file sizes and clear module boundaries | |
| 5. **Reusability**: Analysis functions can be imported and used without UI dependencies | |
| ## Import Flow | |
| ``` | |
| gradio_app.main() | |
| βββ download_and_process_models() | |
| β βββ set_cancer_subtype_maps() [from ui.app] | |
| β βββ get_oncotree_code_name() [from ui.utils] | |
| βββ analyze_slide() [from analysis] | |
| β βββ segment_tissue() [from mussel] | |
| β βββ get_features() [from mussel] | |
| β βββ filter_features() [from mussel] | |
| β βββ run_aeon() [from inference] | |
| β βββ run_paladin() [from inference] | |
| βββ launch_gradio() [from ui] | |
| βββ analyze_slides() [from ui.app] | |
| β βββ analyze_slide() [from analysis] | |
| βββ validate_settings() [from ui.utils] | |
| ``` | |
| ## File Size Comparison | |
| File | Original | Refactored | Change | |
| -----|----------|------------|-------- | |
| `gradio_app.py` | 843 lines | 230 lines | -73% | |
| UI Components | - | 474 lines | +474 | |
| Analysis Logic | - | 200 lines | +200 | |
| The refactoring distributed the original monolithic file into focused, maintainable modules while maintaining all functionality. | |
| ## Key Dependencies | |
| ### External Libraries | |
| - **Gradio**: Web interface framework for creating the UI | |
| - **PyTorch**: Deep learning framework for model inference | |
| - **Pandas**: Data manipulation and CSV handling | |
| - **Mussel**: Pathology-specific utilities for: | |
| - Tissue segmentation | |
| - Feature extraction (CTransPath, Optimus) | |
| - Marker classification | |
| - **Paladin**: Biomarker prediction models | |
| - **HuggingFace Hub**: Model downloading and management | |
| - **Loguru**: Logging with enhanced features | |
| ### Model Components | |
| 1. **CTransPath**: Pre-trained vision transformer for histopathology feature extraction | |
| 2. **Optimus**: Foundation model for pathology image features | |
| 3. **Marker Classifier**: Filters features to tumor-relevant regions | |
| 4. **Aeon**: Multi-task model for cancer subtype classification | |
| 5. **Paladin**: Suite of models for biomarker prediction across cancer subtypes | |
| ## Data Flow | |
| ``` | |
| WSI File (*.svs, *.tif) | |
| β | |
| Tissue Segmentation (Mussel) | |
| β | |
| CTransPath Feature Extraction | |
| β | |
| Marker Classification (filter to tumor regions) | |
| β | |
| Optimus Feature Extraction (on filtered tiles) | |
| β | |
| βββ Aeon Inference β Cancer Subtype Predictions | |
| β β | |
| βββ Paladin Inference β Biomarker Predictions | |
| β | |
| Results (CSV, Visualizations) | |
| ``` | |
| ## Design Principles | |
| 1. **Modularity**: Each component has a single, well-defined responsibility | |
| 2. **Testability**: Modules can be tested independently with mocking | |
| 3. **Reusability**: Core analysis functions can be used without UI | |
| 4. **Maintainability**: Clear interfaces and documentation | |
| 5. **Extensibility**: New models or features can be added with minimal changes | |
| ## Future Enhancements | |
| Potential areas for extension: | |
| - Support for additional image formats | |
| - Real-time analysis progress tracking | |
| - Integration with PACS systems | |
| - Support for additional biomarkers | |
| - Batch processing optimization | |
| - Cloud deployment configurations | |