Spaces:
Sleeping
Sleeping
copilot-swe-agent[bot] raylim commited on
Commit Β·
315cd39
1
Parent(s): 71ae2f0
Enhance documentation with additional details
Browse files- Add docstrings to __init__.py files
- Add architecture overview section to ARCHITECTURE.md
- Add dependencies and data flow sections
- Add design principles and future enhancements
- Link to ARCHITECTURE.md from README
Co-authored-by: raylim <3074310+raylim@users.noreply.github.com>
- ARCHITECTURE.md +72 -0
- README.md +5 -0
- src/mosaic/__init__.py +5 -0
- src/mosaic/inference/__init__.py +7 -0
- src/mosaic/ui/__init__.py +6 -0
ARCHITECTURE.md
CHANGED
|
@@ -1,6 +1,16 @@
|
|
| 1 |
|
| 2 |
# Mosaic Architecture
|
| 3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
## Module Structure
|
| 5 |
|
| 6 |
The Mosaic application has been refactored for better readability and maintainability. The codebase is now organized into the following modules:
|
|
@@ -94,3 +104,65 @@ Analysis Logic | - | 200 lines | +200
|
|
| 94 |
|
| 95 |
The refactoring distributed the original monolithic file into focused, maintainable modules while maintaining all functionality.
|
| 96 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
|
| 2 |
# Mosaic Architecture
|
| 3 |
|
| 4 |
+
This document describes the internal architecture and module organization of the Mosaic application.
|
| 5 |
+
|
| 6 |
+
## Overview
|
| 7 |
+
|
| 8 |
+
Mosaic is a deep learning pipeline for analyzing H&E whole slide images (WSIs) to predict:
|
| 9 |
+
1. **Cancer Subtypes** using the Aeon model
|
| 10 |
+
2. **Biomarkers** using the Paladin model
|
| 11 |
+
|
| 12 |
+
The application is organized into several focused modules with clear separation of concerns.
|
| 13 |
+
|
| 14 |
## Module Structure
|
| 15 |
|
| 16 |
The Mosaic application has been refactored for better readability and maintainability. The codebase is now organized into the following modules:
|
|
|
|
| 104 |
|
| 105 |
The refactoring distributed the original monolithic file into focused, maintainable modules while maintaining all functionality.
|
| 106 |
|
| 107 |
+
## Key Dependencies
|
| 108 |
+
|
| 109 |
+
### External Libraries
|
| 110 |
+
|
| 111 |
+
- **Gradio**: Web interface framework for creating the UI
|
| 112 |
+
- **PyTorch**: Deep learning framework for model inference
|
| 113 |
+
- **Pandas**: Data manipulation and CSV handling
|
| 114 |
+
- **Mussel**: Pathology-specific utilities for:
|
| 115 |
+
- Tissue segmentation
|
| 116 |
+
- Feature extraction (CTransPath, Optimus)
|
| 117 |
+
- Marker classification
|
| 118 |
+
- **Paladin**: Biomarker prediction models
|
| 119 |
+
- **HuggingFace Hub**: Model downloading and management
|
| 120 |
+
- **Loguru**: Logging with enhanced features
|
| 121 |
+
|
| 122 |
+
### Model Components
|
| 123 |
+
|
| 124 |
+
1. **CTransPath**: Pre-trained vision transformer for histopathology feature extraction
|
| 125 |
+
2. **Optimus**: Foundation model for pathology image features
|
| 126 |
+
3. **Marker Classifier**: Filters features to tumor-relevant regions
|
| 127 |
+
4. **Aeon**: Multi-task model for cancer subtype classification
|
| 128 |
+
5. **Paladin**: Suite of models for biomarker prediction across cancer subtypes
|
| 129 |
+
|
| 130 |
+
## Data Flow
|
| 131 |
+
|
| 132 |
+
```
|
| 133 |
+
WSI File (*.svs, *.tif)
|
| 134 |
+
β
|
| 135 |
+
Tissue Segmentation (Mussel)
|
| 136 |
+
β
|
| 137 |
+
CTransPath Feature Extraction
|
| 138 |
+
β
|
| 139 |
+
Marker Classification (filter to tumor regions)
|
| 140 |
+
β
|
| 141 |
+
Optimus Feature Extraction (on filtered tiles)
|
| 142 |
+
β
|
| 143 |
+
βββ Aeon Inference β Cancer Subtype Predictions
|
| 144 |
+
β β
|
| 145 |
+
βββ Paladin Inference β Biomarker Predictions
|
| 146 |
+
β
|
| 147 |
+
Results (CSV, Visualizations)
|
| 148 |
+
```
|
| 149 |
+
|
| 150 |
+
## Design Principles
|
| 151 |
+
|
| 152 |
+
1. **Modularity**: Each component has a single, well-defined responsibility
|
| 153 |
+
2. **Testability**: Modules can be tested independently with mocking
|
| 154 |
+
3. **Reusability**: Core analysis functions can be used without UI
|
| 155 |
+
4. **Maintainability**: Clear interfaces and documentation
|
| 156 |
+
5. **Extensibility**: New models or features can be added with minimal changes
|
| 157 |
+
|
| 158 |
+
## Future Enhancements
|
| 159 |
+
|
| 160 |
+
Potential areas for extension:
|
| 161 |
+
|
| 162 |
+
- Support for additional image formats
|
| 163 |
+
- Real-time analysis progress tracking
|
| 164 |
+
- Integration with PACS systems
|
| 165 |
+
- Support for additional biomarkers
|
| 166 |
+
- Batch processing optimization
|
| 167 |
+
- Cloud deployment configurations
|
| 168 |
+
|
README.md
CHANGED
|
@@ -19,6 +19,7 @@ Mosaic is a deep learning model designed for predicting cancer subtypes and biom
|
|
| 19 |
- [Cancer Subtypes](#cancer-subtypes)
|
| 20 |
- [Troubleshooting](#troubleshooting)
|
| 21 |
- [Contributing](#contributing)
|
|
|
|
| 22 |
- [License](#license)
|
| 23 |
|
| 24 |
### System requirements
|
|
@@ -293,6 +294,10 @@ If the default port 7860 is already in use:
|
|
| 293 |
|
| 294 |
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on how to contribute to this project.
|
| 295 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 296 |
## License
|
| 297 |
|
| 298 |
This project is licensed under the terms specified in the LICENSE file.
|
|
|
|
| 19 |
- [Cancer Subtypes](#cancer-subtypes)
|
| 20 |
- [Troubleshooting](#troubleshooting)
|
| 21 |
- [Contributing](#contributing)
|
| 22 |
+
- [Architecture](#architecture)
|
| 23 |
- [License](#license)
|
| 24 |
|
| 25 |
### System requirements
|
|
|
|
| 294 |
|
| 295 |
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on how to contribute to this project.
|
| 296 |
|
| 297 |
+
## Architecture
|
| 298 |
+
|
| 299 |
+
For detailed information about the code structure and module organization, see [ARCHITECTURE.md](ARCHITECTURE.md).
|
| 300 |
+
|
| 301 |
## License
|
| 302 |
|
| 303 |
This project is licensed under the terms specified in the LICENSE file.
|
src/mosaic/__init__.py
CHANGED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Mosaic: H&E Whole Slide Image Cancer Subtype and Biomarker Inference.
|
| 2 |
+
|
| 3 |
+
Mosaic is a deep learning pipeline for analyzing H&E-stained whole slide images
|
| 4 |
+
to predict cancer subtypes (via Aeon) and biomarkers (via Paladin).
|
| 5 |
+
"""
|
src/mosaic/inference/__init__.py
CHANGED
|
@@ -1,2 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
from .aeon import run as run_aeon
|
| 2 |
from .paladin import run as run_paladin
|
|
|
|
| 1 |
+
"""Inference module for Mosaic.
|
| 2 |
+
|
| 3 |
+
This module provides the inference interfaces for:
|
| 4 |
+
- Aeon: Cancer subtype prediction from WSI features
|
| 5 |
+
- Paladin: Biomarker prediction from WSI features
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
from .aeon import run as run_aeon
|
| 9 |
from .paladin import run as run_paladin
|
src/mosaic/ui/__init__.py
CHANGED
|
@@ -1,3 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
from .app import launch_gradio
|
| 2 |
|
| 3 |
__all__ = ["launch_gradio"]
|
|
|
|
| 1 |
+
"""UI module for Mosaic Gradio web interface.
|
| 2 |
+
|
| 3 |
+
This module provides the web-based user interface for Mosaic,
|
| 4 |
+
allowing interactive analysis of whole slide images through a browser.
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
from .app import launch_gradio
|
| 8 |
|
| 9 |
__all__ = ["launch_gradio"]
|