Spaces:

raylim
/

mosaic-zero

Sleeping

App Files Files Community

mosaic-zero / ARCHITECTURE.md

copilot-swe-agent[bot]

Enhance documentation with additional details

315cd39 3 months ago

preview code

raw

history blame contribute delete

5.71 kB


	# Mosaic Architecture

	This document describes the internal architecture and module organization of the Mosaic application.

	## Overview

	Mosaic is a deep learning pipeline for analyzing H&E whole slide images (WSIs) to predict:
	1. Cancer Subtypes using the Aeon model
	2. Biomarkers using the Paladin model

	The application is organized into several focused modules with clear separation of concerns.

	## Module Structure

	The Mosaic application has been refactored for better readability and maintainability. The codebase is now organized into the following modules:

	### Core Modules

	#### `mosaic.gradio_app` (Main Entry Point)
	- Location: `src/mosaic/gradio_app.py`
	- Purpose: CLI entry point and command-line argument parsing
	- Responsibilities:
	- Command-line argument parsing
	- Model downloading and initialization
	- Single slide and batch processing CLI modes
	- Launching the Gradio web UI

	#### `mosaic.analysis`
	- Location: `src/mosaic/analysis.py`
	- Purpose: Core slide analysis logic
	- Responsibilities:
	- Tissue segmentation
	- Feature extraction (CTransPath and Optimus)
	- Feature filtering with marker classifier
	- Aeon inference (cancer subtype prediction)
	- Paladin inference (biomarker prediction)
	- Key Function: `analyze_slide()`

	#### `mosaic.ui` Package
	- Location: `src/mosaic/ui/`
	- Purpose: Gradio web interface components
	- Submodules:

	- `ui.__init__.py`: Exports the main `launch_gradio` function

	- `ui.app`: Gradio interface definition
	- UI layout and component definitions
	- Event handlers for user interactions
	- Multi-slide analysis workflow
	- Key Functions: `launch_gradio()`, `analyze_slides()`, `set_cancer_subtype_maps()`

	- `ui.utils`: UI utility functions
	- Settings validation
	- CSV file handling
	- OncoTree API integration
	- User session directory management
	- Key Functions: `validate_settings()`, `load_settings()`, `get_oncotree_code_name()`, `create_user_directory()`

	### Inference Modules

	#### `mosaic.inference`
	- Location: `src/mosaic/inference/`
	- Purpose: ML model inference implementations
	- Submodules:
	- `aeon.py`: Cancer subtype inference
	- `paladin.py`: Biomarker inference
	- `data.py`: Data structures and utilities

	## Code Organization Benefits

	1. Separation of Concerns: UI, analysis, and CLI logic are now clearly separated
	2. Improved Maintainability: Each module has a single, well-defined responsibility
	3. Better Testability: Individual modules can be tested independently
	4. Enhanced Readability: Reduced file sizes and clear module boundaries
	5. Reusability: Analysis functions can be imported and used without UI dependencies

	## Import Flow

	```
	gradio_app.main()
	├── download_and_process_models()
	│ ├── set_cancer_subtype_maps() [from ui.app]
	│ └── get_oncotree_code_name() [from ui.utils]
	├── analyze_slide() [from analysis]
	│ ├── segment_tissue() [from mussel]
	│ ├── get_features() [from mussel]
	│ ├── filter_features() [from mussel]
	│ ├── run_aeon() [from inference]
	│ └── run_paladin() [from inference]
	└── launch_gradio() [from ui]
	├── analyze_slides() [from ui.app]
	│ └── analyze_slide() [from analysis]
	└── validate_settings() [from ui.utils]
	```

	## File Size Comparison

	File \| Original \| Refactored \| Change
	-----\|----------\|------------\|--------
	`gradio_app.py` \| 843 lines \| 230 lines \| -73%
	UI Components \| - \| 474 lines \| +474
	Analysis Logic \| - \| 200 lines \| +200

	The refactoring distributed the original monolithic file into focused, maintainable modules while maintaining all functionality.

	## Key Dependencies

	### External Libraries

	- Gradio: Web interface framework for creating the UI
	- PyTorch: Deep learning framework for model inference
	- Pandas: Data manipulation and CSV handling
	- Mussel: Pathology-specific utilities for:
	- Tissue segmentation
	- Feature extraction (CTransPath, Optimus)
	- Marker classification
	- Paladin: Biomarker prediction models
	- HuggingFace Hub: Model downloading and management
	- Loguru: Logging with enhanced features

	### Model Components

	1. CTransPath: Pre-trained vision transformer for histopathology feature extraction
	2. Optimus: Foundation model for pathology image features
	3. Marker Classifier: Filters features to tumor-relevant regions
	4. Aeon: Multi-task model for cancer subtype classification
	5. Paladin: Suite of models for biomarker prediction across cancer subtypes

	## Data Flow

	```
	WSI File (.svs, .tif)
	↓
	Tissue Segmentation (Mussel)
	↓
	CTransPath Feature Extraction
	↓
	Marker Classification (filter to tumor regions)
	↓
	Optimus Feature Extraction (on filtered tiles)
	↓
	├── Aeon Inference → Cancer Subtype Predictions
	│ ↓
	└── Paladin Inference → Biomarker Predictions
	↓
	Results (CSV, Visualizations)
	```

	## Design Principles

	1. Modularity: Each component has a single, well-defined responsibility
	2. Testability: Modules can be tested independently with mocking
	3. Reusability: Core analysis functions can be used without UI
	4. Maintainability: Clear interfaces and documentation
	5. Extensibility: New models or features can be added with minimal changes

	## Future Enhancements

	Potential areas for extension:

	- Support for additional image formats
	- Real-time analysis progress tracking
	- Integration with PACS systems
	- Support for additional biomarkers
	- Batch processing optimization
	- Cloud deployment configurations