File size: 5,706 Bytes
73b56f1
08925d1
 
315cd39
 
 
 
 
 
 
 
 
 
08925d1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61aa065
 
 
 
 
08925d1
 
73b56f1
315cd39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169

# Mosaic Architecture

This document describes the internal architecture and module organization of the Mosaic application.

## Overview

Mosaic is a deep learning pipeline for analyzing H&E whole slide images (WSIs) to predict:
1. **Cancer Subtypes** using the Aeon model
2. **Biomarkers** using the Paladin model

The application is organized into several focused modules with clear separation of concerns.

## Module Structure

The Mosaic application has been refactored for better readability and maintainability. The codebase is now organized into the following modules:

### Core Modules

#### `mosaic.gradio_app` (Main Entry Point)
- **Location**: `src/mosaic/gradio_app.py`
- **Purpose**: CLI entry point and command-line argument parsing
- **Responsibilities**:
  - Command-line argument parsing
  - Model downloading and initialization
  - Single slide and batch processing CLI modes
  - Launching the Gradio web UI

#### `mosaic.analysis`
- **Location**: `src/mosaic/analysis.py`
- **Purpose**: Core slide analysis logic
- **Responsibilities**:
  - Tissue segmentation
  - Feature extraction (CTransPath and Optimus)
  - Feature filtering with marker classifier
  - Aeon inference (cancer subtype prediction)
  - Paladin inference (biomarker prediction)
- **Key Function**: `analyze_slide()`

#### `mosaic.ui` Package
- **Location**: `src/mosaic/ui/`
- **Purpose**: Gradio web interface components
- **Submodules**:
  
  - **`ui.__init__.py`**: Exports the main `launch_gradio` function
  
  - **`ui.app`**: Gradio interface definition
    - UI layout and component definitions
    - Event handlers for user interactions
    - Multi-slide analysis workflow
    - Key Functions: `launch_gradio()`, `analyze_slides()`, `set_cancer_subtype_maps()`
  
  - **`ui.utils`**: UI utility functions
    - Settings validation
    - CSV file handling
    - OncoTree API integration
    - User session directory management
    - Key Functions: `validate_settings()`, `load_settings()`, `get_oncotree_code_name()`, `create_user_directory()`

### Inference Modules

#### `mosaic.inference`
- **Location**: `src/mosaic/inference/`
- **Purpose**: ML model inference implementations
- **Submodules**:
  - `aeon.py`: Cancer subtype inference
  - `paladin.py`: Biomarker inference
  - `data.py`: Data structures and utilities

## Code Organization Benefits

1. **Separation of Concerns**: UI, analysis, and CLI logic are now clearly separated
2. **Improved Maintainability**: Each module has a single, well-defined responsibility
3. **Better Testability**: Individual modules can be tested independently
4. **Enhanced Readability**: Reduced file sizes and clear module boundaries
5. **Reusability**: Analysis functions can be imported and used without UI dependencies

## Import Flow

```
gradio_app.main()
β”œβ”€β”€ download_and_process_models()
β”‚   β”œβ”€β”€ set_cancer_subtype_maps() [from ui.app]
β”‚   └── get_oncotree_code_name() [from ui.utils]
β”œβ”€β”€ analyze_slide() [from analysis]
β”‚   β”œβ”€β”€ segment_tissue() [from mussel]
β”‚   β”œβ”€β”€ get_features() [from mussel]
β”‚   β”œβ”€β”€ filter_features() [from mussel]
β”‚   β”œβ”€β”€ run_aeon() [from inference]
β”‚   └── run_paladin() [from inference]
└── launch_gradio() [from ui]
    β”œβ”€β”€ analyze_slides() [from ui.app]
    β”‚   └── analyze_slide() [from analysis]
    └── validate_settings() [from ui.utils]
```

## File Size Comparison

File | Original | Refactored | Change
-----|----------|------------|--------
`gradio_app.py` | 843 lines | 230 lines | -73%
UI Components | - | 474 lines | +474
Analysis Logic | - | 200 lines | +200

The refactoring distributed the original monolithic file into focused, maintainable modules while maintaining all functionality.

## Key Dependencies

### External Libraries

- **Gradio**: Web interface framework for creating the UI
- **PyTorch**: Deep learning framework for model inference
- **Pandas**: Data manipulation and CSV handling
- **Mussel**: Pathology-specific utilities for:
  - Tissue segmentation
  - Feature extraction (CTransPath, Optimus)
  - Marker classification
- **Paladin**: Biomarker prediction models
- **HuggingFace Hub**: Model downloading and management
- **Loguru**: Logging with enhanced features

### Model Components

1. **CTransPath**: Pre-trained vision transformer for histopathology feature extraction
2. **Optimus**: Foundation model for pathology image features
3. **Marker Classifier**: Filters features to tumor-relevant regions
4. **Aeon**: Multi-task model for cancer subtype classification
5. **Paladin**: Suite of models for biomarker prediction across cancer subtypes

## Data Flow

```
WSI File (*.svs, *.tif)
    ↓
Tissue Segmentation (Mussel)
    ↓
CTransPath Feature Extraction
    ↓
Marker Classification (filter to tumor regions)
    ↓
Optimus Feature Extraction (on filtered tiles)
    ↓
β”œβ”€β”€ Aeon Inference β†’ Cancer Subtype Predictions
β”‚       ↓
└── Paladin Inference β†’ Biomarker Predictions
        ↓
    Results (CSV, Visualizations)
```

## Design Principles

1. **Modularity**: Each component has a single, well-defined responsibility
2. **Testability**: Modules can be tested independently with mocking
3. **Reusability**: Core analysis functions can be used without UI
4. **Maintainability**: Clear interfaces and documentation
5. **Extensibility**: New models or features can be added with minimal changes

## Future Enhancements

Potential areas for extension:

- Support for additional image formats
- Real-time analysis progress tracking
- Integration with PACS systems
- Support for additional biomarkers
- Batch processing optimization
- Cloud deployment configurations