Spaces:
Running
on
Zero
Running
on
Zero
File size: 5,706 Bytes
73b56f1 08925d1 315cd39 08925d1 61aa065 08925d1 73b56f1 315cd39 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
# Mosaic Architecture
This document describes the internal architecture and module organization of the Mosaic application.
## Overview
Mosaic is a deep learning pipeline for analyzing H&E whole slide images (WSIs) to predict:
1. **Cancer Subtypes** using the Aeon model
2. **Biomarkers** using the Paladin model
The application is organized into several focused modules with clear separation of concerns.
## Module Structure
The Mosaic application has been refactored for better readability and maintainability. The codebase is now organized into the following modules:
### Core Modules
#### `mosaic.gradio_app` (Main Entry Point)
- **Location**: `src/mosaic/gradio_app.py`
- **Purpose**: CLI entry point and command-line argument parsing
- **Responsibilities**:
- Command-line argument parsing
- Model downloading and initialization
- Single slide and batch processing CLI modes
- Launching the Gradio web UI
#### `mosaic.analysis`
- **Location**: `src/mosaic/analysis.py`
- **Purpose**: Core slide analysis logic
- **Responsibilities**:
- Tissue segmentation
- Feature extraction (CTransPath and Optimus)
- Feature filtering with marker classifier
- Aeon inference (cancer subtype prediction)
- Paladin inference (biomarker prediction)
- **Key Function**: `analyze_slide()`
#### `mosaic.ui` Package
- **Location**: `src/mosaic/ui/`
- **Purpose**: Gradio web interface components
- **Submodules**:
- **`ui.__init__.py`**: Exports the main `launch_gradio` function
- **`ui.app`**: Gradio interface definition
- UI layout and component definitions
- Event handlers for user interactions
- Multi-slide analysis workflow
- Key Functions: `launch_gradio()`, `analyze_slides()`, `set_cancer_subtype_maps()`
- **`ui.utils`**: UI utility functions
- Settings validation
- CSV file handling
- OncoTree API integration
- User session directory management
- Key Functions: `validate_settings()`, `load_settings()`, `get_oncotree_code_name()`, `create_user_directory()`
### Inference Modules
#### `mosaic.inference`
- **Location**: `src/mosaic/inference/`
- **Purpose**: ML model inference implementations
- **Submodules**:
- `aeon.py`: Cancer subtype inference
- `paladin.py`: Biomarker inference
- `data.py`: Data structures and utilities
## Code Organization Benefits
1. **Separation of Concerns**: UI, analysis, and CLI logic are now clearly separated
2. **Improved Maintainability**: Each module has a single, well-defined responsibility
3. **Better Testability**: Individual modules can be tested independently
4. **Enhanced Readability**: Reduced file sizes and clear module boundaries
5. **Reusability**: Analysis functions can be imported and used without UI dependencies
## Import Flow
```
gradio_app.main()
βββ download_and_process_models()
β βββ set_cancer_subtype_maps() [from ui.app]
β βββ get_oncotree_code_name() [from ui.utils]
βββ analyze_slide() [from analysis]
β βββ segment_tissue() [from mussel]
β βββ get_features() [from mussel]
β βββ filter_features() [from mussel]
β βββ run_aeon() [from inference]
β βββ run_paladin() [from inference]
βββ launch_gradio() [from ui]
βββ analyze_slides() [from ui.app]
β βββ analyze_slide() [from analysis]
βββ validate_settings() [from ui.utils]
```
## File Size Comparison
File | Original | Refactored | Change
-----|----------|------------|--------
`gradio_app.py` | 843 lines | 230 lines | -73%
UI Components | - | 474 lines | +474
Analysis Logic | - | 200 lines | +200
The refactoring distributed the original monolithic file into focused, maintainable modules while maintaining all functionality.
## Key Dependencies
### External Libraries
- **Gradio**: Web interface framework for creating the UI
- **PyTorch**: Deep learning framework for model inference
- **Pandas**: Data manipulation and CSV handling
- **Mussel**: Pathology-specific utilities for:
- Tissue segmentation
- Feature extraction (CTransPath, Optimus)
- Marker classification
- **Paladin**: Biomarker prediction models
- **HuggingFace Hub**: Model downloading and management
- **Loguru**: Logging with enhanced features
### Model Components
1. **CTransPath**: Pre-trained vision transformer for histopathology feature extraction
2. **Optimus**: Foundation model for pathology image features
3. **Marker Classifier**: Filters features to tumor-relevant regions
4. **Aeon**: Multi-task model for cancer subtype classification
5. **Paladin**: Suite of models for biomarker prediction across cancer subtypes
## Data Flow
```
WSI File (*.svs, *.tif)
β
Tissue Segmentation (Mussel)
β
CTransPath Feature Extraction
β
Marker Classification (filter to tumor regions)
β
Optimus Feature Extraction (on filtered tiles)
β
βββ Aeon Inference β Cancer Subtype Predictions
β β
βββ Paladin Inference β Biomarker Predictions
β
Results (CSV, Visualizations)
```
## Design Principles
1. **Modularity**: Each component has a single, well-defined responsibility
2. **Testability**: Modules can be tested independently with mocking
3. **Reusability**: Core analysis functions can be used without UI
4. **Maintainability**: Clear interfaces and documentation
5. **Extensibility**: New models or features can be added with minimal changes
## Future Enhancements
Potential areas for extension:
- Support for additional image formats
- Real-time analysis progress tracking
- Integration with PACS systems
- Support for additional biomarkers
- Batch processing optimization
- Cloud deployment configurations
|