Spaces:

raylim
/

mosaic-zero

Sleeping

copilot-swe-agent[bot] raylim commited on Oct 21, 2025

Commit

71ae2f0

1 Parent(s): e6c73c0

Add comprehensive documentation improvements

- Fix installation instructions in README (correct repo URL)
- Fix command name inconsistency (mosaic_app -> mosaic)
- Add detailed examples section to README
- Add CSV file format documentation
- Add cancer subtypes reference
- Add troubleshooting section
- Add advanced usage examples
- Create CONTRIBUTING.md with development guidelines
- Add comprehensive docstrings to all modules
- Add module-level docstrings to core modules

Co-authored-by: raylim <3074310+raylim@users.noreply.github.com>

Files changed (9) hide show

CONTRIBUTING.md +267 -0
README.md +214 -7
src/mosaic/analysis.py +37 -0
src/mosaic/gradio_app.py +30 -0
src/mosaic/inference/aeon.py +22 -0
src/mosaic/inference/data.py +8 -0
src/mosaic/inference/paladin.py +83 -17
src/mosaic/ui/app.py +10 -0
src/mosaic/ui/utils.py +72 -3

CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1,267 @@

+# Contributing to Mosaic
+Thank you for your interest in contributing to Mosaic! This document provides guidelines and instructions for contributing to the project.
+## Table of Contents
+- [Getting Started](#getting-started)
+- [Development Setup](#development-setup)
+- [Code Style](#code-style)
+- [Testing](#testing)
+- [Submitting Changes](#submitting-changes)
+- [Reporting Issues](#reporting-issues)
+## Getting Started
+1. Fork the repository on GitHub
+2. Clone your fork locally
+3. Set up the development environment
+4. Create a new branch for your changes
+5. Make your changes
+6. Test your changes
+7. Submit a pull request
+## Development Setup
+### Prerequisites
+- Python 3.10 or higher
+- [uv](https://docs.astral.sh/uv/) package manager
+- NVIDIA GPU with CUDA support (for model inference)
+### Installation
+1. Clone the repository:
+```bash
+git clone https://github.com/pathology-data-mining/mosaic.git
+cd mosaic
+```
+2. Install dependencies including development tools:
+```bash
+uv sync
+```
+This will install all dependencies, including development tools like pytest, pylint, and black.
+### Running Tests
+Run all tests:
+```bash
+pytest tests/
+```
+Run tests with coverage report:
+```bash
+pytest tests/ --cov=src/mosaic --cov-report=term-missing
+```
+Run a specific test file:
+```bash
+pytest tests/inference/test_data.py -v
+```
+### Code Quality
+#### Linting
+We use pylint for code linting. Run it with:
+```bash
+pylint src/mosaic
+```
+#### Code Formatting
+We use black for code formatting. Format your code with:
+```bash
+black src/mosaic tests/
+```
+## Code Style
+### Python Style Guide
+- Follow [PEP 8](https://pep8.org/) style guidelines
+- Use meaningful variable and function names
+- Add docstrings to all public functions, classes, and modules
+- Keep functions focused and concise
+- Use type hints where appropriate
+### Docstring Format
+Use Google-style docstrings:
+```python
+def function_name(param1: str, param2: int) -> bool:
+    """Brief description of the function.
+    More detailed description if needed.
+    Args:
+        param1: Description of param1
+        param2: Description of param2
+    Returns:
+        Description of return value
+    Raises:
+        ValueError: Description of when this error is raised
+    """
+    pass
+```
+### Commit Messages
+- Use clear and descriptive commit messages
+- Start with a verb in the imperative mood (e.g., "Add", "Fix", "Update")
+- Keep the first line under 72 characters
+- Provide additional context in the commit body if needed
+Example:
+```
+Add docstrings to inference module functions
+- Added comprehensive docstrings to all public functions
+- Included type hints for better code clarity
+- Updated existing docstrings to follow Google style
+```
+## Testing
+### Writing Tests
+- Write tests for all new features and bug fixes
+- Place tests in the appropriate directory under `tests/`
+- Use pytest fixtures for common setup code
+- Mock external dependencies (e.g., model loading, network requests)
+- Ensure tests can run without GPU access or large model downloads
+### Test Structure
+```python
+import pytest
+from mosaic.module import function_to_test
+def test_function_basic_case():
+    """Test basic functionality of the function."""
+    result = function_to_test(input_data)
+    assert result == expected_output
+def test_function_edge_case():
+    """Test edge cases."""
+    with pytest.raises(ValueError):
+        function_to_test(invalid_input)
+```
+## Submitting Changes
+### Pull Request Process
+1. **Create a feature branch**:
+   ```bash
+   git checkout -b feature/your-feature-name
+   ```
+2. **Make your changes**:
+   - Write clear, focused commits
+   - Add tests for new functionality
+   - Update documentation as needed
+3. **Ensure code quality**:
+   ```bash
+   black src/mosaic tests/
+   pylint src/mosaic
+   pytest tests/
+   ```
+4. **Push to your fork**:
+   ```bash
+   git push origin feature/your-feature-name
+   ```
+5. **Create a Pull Request**:
+   - Go to the GitHub repository
+   - Click "New Pull Request"
+   - Select your branch
+   - Provide a clear description of your changes
+   - Reference any related issues
+### Pull Request Guidelines
+- Keep pull requests focused on a single feature or fix
+- Update documentation for any changed functionality
+- Add or update tests as appropriate
+- Ensure all tests pass before submitting
+- Respond to review feedback promptly
+## Reporting Issues
+### Bug Reports
+When reporting a bug, please include:
+- A clear and descriptive title
+- Steps to reproduce the issue
+- Expected behavior
+- Actual behavior
+- System information (OS, Python version, GPU model)
+- Relevant log output or error messages
+- Minimal code example to reproduce the issue
+### Feature Requests
+When suggesting a feature, please include:
+- A clear description of the feature
+- The use case and benefits
+- Any alternative solutions you've considered
+- Examples of how the feature would be used
+### Issue Templates
+Please use the appropriate issue template when creating a new issue.
+## Development Guidelines
+### Module Organization
+- Keep modules focused on a single responsibility
+- Place UI-related code in `src/mosaic/ui/`
+- Place inference code in `src/mosaic/inference/`
+- Place analysis logic in `src/mosaic/analysis.py`
+- Avoid circular dependencies
+### Adding New Features
+When adding new features:
+1. Discuss the feature in an issue first
+2. Follow the existing code structure
+3. Add comprehensive tests
+4. Update relevant documentation
+5. Consider backward compatibility
+### Dependencies
+- Avoid adding new dependencies unless necessary
+- Discuss new dependencies in an issue or pull request
+- Ensure dependencies are compatible with the project's license
+- Pin dependency versions in `pyproject.toml`
+## Questions?
+If you have questions about contributing, please:
+- Check existing issues and pull requests
+- Open a new issue with your question
+- Join our community discussions (if available)
+Thank you for contributing to Mosaic!

README.md CHANGED Viewed

@@ -4,8 +4,22 @@ Mosaic is a deep learning model designed for predicting cancer subtypes and biom
 ## Table of Contents
 - [Installation](#installation)
 - [Usage](#usage)
 ### System requirements
@@ -25,7 +39,15 @@ Supported systems:
 ## Installation
 ```bash
-uv pip install git+ssh://git@github.com/pathology-data-mining/paladin_webapp.git@dev
 ```
 ## Usage
@@ -49,23 +71,23 @@ export HF_HOME="PATH-TO-HUGGINGFACE-HOME"
 Run the web application with:
 ```bash
-mosaic_app
 ```
 It will start a web server on port 7860 by default. You can access the web interface by navigating to `http://localhost:7860` in your web browser.
 ### Command Line Interface
-To process a WSI, use the following command:
 ```bash
-mosaic_app --slide-path /path/to/your/wsi.svs --output-dir /path/to/output/directory
 ```
 To process a batch of WSIs, use:
 ```bash
-mosaic_app --slide-csv /path/to/your/wsi_list.csv --output-dir /path/to/output/directory
 ```
 The CSV file should at least contain columns `Slide`, and `Site Type`.
@@ -80,7 +102,7 @@ Optionally, it can also contain `Cancer Subtype`, `Segmentation Config`, and `IH
 See additional options with the help command. This command may take a few seconds to run:
 ```bash
-mosaic_app --help
 ```
 If setting port to run in server mode, you may check for available ports using `ss -tuln | grep :PORT` where PORT is the port number you want to check. No output indicates the port may be available. If port is available, set environment variable `export GRADIO_SERVER_PORT="PORT"`
@@ -88,4 +110,189 @@ If setting port to run in server mode, you may check for available ports using `
 ### Notes
 - The first time you run the application, it will download the necessary models from HuggingFace. This may take some time depending on your internet connection.
-- The models are downloaded to a directory relative to where you run the application. (A subdirectory named `data`).

 ## Table of Contents
+- [System Requirements](#system-requirements)
+- [Pre-requisites](#pre-requisites)
 - [Installation](#installation)
 - [Usage](#usage)
+  - [Initial Setup](#initial-setup)
+  - [Web Application](#web-application)
+  - [Command Line Interface](#command-line-interface)
+  - [Notes](#notes)
+- [Output Files](#output-files)
+- [Examples](#examples)
+- [Advanced Usage](#advanced-usage)
+- [CSV File Format](#csv-file-format)
+- [Cancer Subtypes](#cancer-subtypes)
+- [Troubleshooting](#troubleshooting)
+- [Contributing](#contributing)
+- [License](#license)
 ### System requirements
 ## Installation
 ```bash
+git clone https://github.com/pathology-data-mining/mosaic.git
+cd mosaic
+uv sync
+```
+Alternatively, install directly from the repository:
+```bash
+uv pip install git+https://github.com/pathology-data-mining/mosaic.git
 ```
 ## Usage
 Run the web application with:
 ```bash
+mosaic
 ```
 It will start a web server on port 7860 by default. You can access the web interface by navigating to `http://localhost:7860` in your web browser.
 ### Command Line Interface
+To process a single WSI, use the following command:
 ```bash
+mosaic --slide-path /path/to/your/wsi.svs --output-dir /path/to/output/directory
 ```
 To process a batch of WSIs, use:
 ```bash
+mosaic --slide-csv /path/to/your/wsi_list.csv --output-dir /path/to/output/directory
 ```
 The CSV file should at least contain columns `Slide`, and `Site Type`.
 See additional options with the help command. This command may take a few seconds to run:
 ```bash
+mosaic --help
 ```
 If setting port to run in server mode, you may check for available ports using `ss -tuln | grep :PORT` where PORT is the port number you want to check. No output indicates the port may be available. If port is available, set environment variable `export GRADIO_SERVER_PORT="PORT"`
 ### Notes
 - The first time you run the application, it will download the necessary models from HuggingFace. This may take some time depending on your internet connection.
+- The models are downloaded to a directory named `data` relative to where you run the application.
+## Output Files
+### Single Slide Processing
+When processing a single slide, the following files are generated in the output directory:
+- `{slide_name}_mask.png`: Visualization of the tissue segmentation
+- `{slide_name}_aeon_results.csv`: Cancer subtype predictions with confidence scores (if cancer subtype was set to "Unknown")
+- `{slide_name}_paladin_results.csv`: Biomarker predictions for the slide
+### Batch Processing
+When processing multiple slides, in addition to individual slide outputs, combined results are generated:
+- `combined_aeon_results.csv`: Cancer subtype predictions for all slides in a single file
+- `combined_paladin_results.csv`: Biomarker predictions for all slides in a single file
+## Examples
+### Example 1: Process a single slide with unknown cancer type
+```bash
+mosaic --slide-path /data/slides/sample.svs \
+       --output-dir /data/results \
+       --site-type Primary \
+       --cancer-subtype Unknown \
+       --segmentation-config Resection
+```
+### Example 2: Process a single breast cancer slide with known IHC subtype
+```bash
+mosaic --slide-path /data/slides/breast_sample.svs \
+       --output-dir /data/results \
+       --site-type Primary \
+       --cancer-subtype BRCA \
+       --ihc-subtype "HR+/HER2-" \
+       --segmentation-config Biopsy
+```
+### Example 3: Process multiple slides from CSV
+Create a CSV file `slides.csv` with the following format:
+```csv
+Slide,Site Type,Cancer Subtype,Segmentation Config,IHC Subtype
+/data/slides/sample1.svs,Primary,Unknown,Resection,
+/data/slides/sample2.svs,Metastatic,LUAD,Biopsy,
+/data/slides/sample3.svs,Primary,BRCA,TCGA,HR+/HER2-
+```
+Then run:
+```bash
+mosaic --slide-csv slides.csv --output-dir /data/results
+```
+## Advanced Usage
+### Adjusting Performance
+You can control the number of workers for feature extraction to balance between speed and memory usage:
+```bash
+mosaic --slide-path /path/to/slide.svs \
+       --output-dir /path/to/output \
+       --num-workers 8
+```
+### Running in Server Mode
+To run Mosaic as a web server accessible from other machines:
+```bash
+export GRADIO_SERVER_PORT=7860
+mosaic --server-name 0.0.0.0 --server-port 7860
+```
+Check for available ports using:
+```bash
+ss -tuln | grep :7860
+```
+To share the application publicly (use with caution):
+```bash
+mosaic --share
+```
+### Debug Mode
+Enable debug logging for troubleshooting:
+```bash
+mosaic --debug
+```
+This will create a `debug.log` file with detailed information about the processing steps.
+## CSV File Format
+When processing multiple slides using the `--slide-csv` option, the CSV file must contain the following columns:
+### Required Columns
+- **Slide**: Full path to the WSI file (e.g., `/path/to/slide.svs`)
+- **Site Type**: Either `Primary` or `Metastatic`
+### Optional Columns
+- **Cancer Subtype**: OncoTree code for the cancer subtype (e.g., `LUAD`, `BRCA`, `COAD`). Use `Unknown` to let Aeon infer the cancer type.
+- **Segmentation Config**: One of `Biopsy`, `Resection`, or `TCGA`. Defaults to `Biopsy` if not specified.
+- **IHC Subtype**: For breast cancer (BRCA) only. One of:
+  - `HR+/HER2+`
+  - `HR+/HER2-`
+  - `HR-/HER2+`
+  - `HR-/HER2-`
+### CSV Example
+```csv
+Slide,Site Type,Cancer Subtype,Segmentation Config,IHC Subtype
+/data/slides/lung1.svs,Primary,LUAD,Resection,
+/data/slides/breast1.svs,Primary,BRCA,Biopsy,HR+/HER2-
+/data/slides/unknown1.svs,Metastatic,Unknown,TCGA,
+```
+## Cancer Subtypes
+Mosaic uses OncoTree codes to identify cancer subtypes. Common examples include:
+- **LUAD**: Lung Adenocarcinoma
+- **LUSC**: Lung Squamous Cell Carcinoma
+- **BRCA**: Breast Invasive Carcinoma
+- **COAD**: Colon Adenocarcinoma
+- **READ**: Rectal Adenocarcinoma
+- **PRAD**: Prostate Adenocarcinoma
+- **SKCM**: Skin Cutaneous Melanoma
+For a complete list of supported cancer subtypes, see the [OncoTree website](http://oncotree.mskcc.org/).
+When the cancer subtype is set to `Unknown`, Mosaic will use the Aeon model to predict the most likely cancer subtype based on the H&E image features.
+## Troubleshooting
+### HuggingFace Authentication Errors
+If you encounter authentication errors when downloading models:
+1. Ensure you have access to the PDM-Group on HuggingFace
+2. Create a HuggingFace access token with appropriate permissions
+3. Set the `HF_TOKEN` environment variable correctly
+### Out of Memory Errors
+If you encounter GPU out-of-memory errors:
+1. Reduce the number of workers: `--num-workers 2`
+2. Process slides sequentially instead of in batch
+3. Consider using a GPU with more memory
+### Tissue Segmentation Issues
+If tissue is not being detected correctly:
+1. Try a different segmentation configuration (`Biopsy`, `Resection`, or `TCGA`)
+2. Check that the slide file is not corrupted
+3. Verify the slide format is supported (e.g., `.svs`, `.tif`)
+### Port Already in Use
+If the default port 7860 is already in use:
+1. Check for running processes: `ss -tuln | grep :7860`
+2. Use a different port: `export GRADIO_SERVER_PORT=7861`
+3. Or specify the port directly: `mosaic --server-port 7861`
+## Contributing
+We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on how to contribute to this project.
+## License
+This project is licensed under the terms specified in the LICENSE file.

src/mosaic/analysis.py CHANGED Viewed

@@ -1,3 +1,9 @@
 import pickle
 import torch
 import pandas as pd
@@ -22,6 +28,37 @@ def analyze_slide(
     num_workers=4,
     progress=gr.Progress(track_tqdm=True),
 ):
     if slide_path is None:
         raise gr.Error("Please upload a slide.")
     # Step 1: Segment tissue

+"""Core slide analysis module for Mosaic.
+This module provides the main slide analysis pipeline that integrates tissue segmentation,
+feature extraction, and model inference for cancer subtype and biomarker prediction.
+"""
 import pickle
 import torch
 import pandas as pd
     num_workers=4,
     progress=gr.Progress(track_tqdm=True),
 ):
+    """Analyze a whole slide image for cancer subtype and biomarker prediction.
+    This function performs a complete analysis pipeline including:
+    1. Tissue segmentation
+    2. CTransPath feature extraction
+    3. Feature filtering with marker classifier
+    4. Optimus feature extraction on filtered tiles
+    5. Aeon inference for cancer subtype (if not provided)
+    6. Paladin inference for biomarker prediction
+    Args:
+        slide_path: Path to the whole slide image file
+        seg_config: Segmentation configuration, one of "Biopsy", "Resection", or "TCGA"
+        site_type: Site type, either "Primary" or "Metastatic"
+        cancer_subtype: Cancer subtype (OncoTree code or "Unknown" for inference)
+        cancer_subtype_name_map: Dictionary mapping cancer subtype names to codes
+        ihc_subtype: IHC subtype for breast cancer (optional)
+        num_workers: Number of worker processes for feature extraction
+        progress: Gradio progress tracker for UI updates
+    Returns:
+        tuple: (slide_mask, aeon_results, paladin_results)
+            - slide_mask: PIL Image of tissue segmentation visualization
+            - aeon_results: DataFrame with cancer subtype predictions and confidence scores
+            - paladin_results: DataFrame with biomarker predictions
+    Raises:
+        gr.Error: If no slide is provided
+        gr.Warning: If no tissue is detected in the slide
+        ValueError: If an unknown segmentation configuration is provided
+    """
     if slide_path is None:
         raise gr.Error("Please upload a slide.")
     # Step 1: Segment tissue

src/mosaic/gradio_app.py CHANGED Viewed

@@ -1,3 +1,12 @@
 from argparse import ArgumentParser
 import pandas as pd
 from pathlib import Path
@@ -17,6 +26,17 @@ from mosaic.analysis import analyze_slide
 def download_and_process_models():
     snapshot_download(repo_id="PDM-Group/paladin-aeon-models", local_dir="data")
     model_map = pd.read_csv(
@@ -41,6 +61,16 @@ def download_and_process_models():
 def main():
     parser = ArgumentParser()
     parser.add_argument("--debug", action="store_true", help="Enable debug logging")
     parser.add_argument(

+"""Mosaic command-line interface and entry point.
+This module provides the main CLI for the Mosaic application, handling:
+- Model downloading and initialization
+- Single slide processing
+- Batch slide processing from CSV
+- Launching the Gradio web interface
+"""
 from argparse import ArgumentParser
 import pandas as pd
 from pathlib import Path
 def download_and_process_models():
+    """Download models from HuggingFace and initialize cancer subtype mappings.
+    Downloads the Paladin and Aeon models from the PDM-Group HuggingFace repository
+    and creates mappings between cancer subtype names and OncoTree codes.
+    Returns:
+        tuple: (cancer_subtype_name_map, reversed_cancer_subtype_name_map, cancer_subtypes)
+            - cancer_subtype_name_map: Dict mapping display names to OncoTree codes
+            - reversed_cancer_subtype_name_map: Dict mapping OncoTree codes to display names
+            - cancer_subtypes: List of all supported cancer subtype codes
+    """
     snapshot_download(repo_id="PDM-Group/paladin-aeon-models", local_dir="data")
     model_map = pd.read_csv(
 def main():
+    """Main entry point for the Mosaic application.
+    Parses command-line arguments and routes to the appropriate mode:
+    - Single slide processing (--slide-path)
+    - Batch processing (--slide-csv)
+    - Web interface (default, no slide arguments)
+    Command-line arguments control analysis parameters like site type,
+    cancer subtype, segmentation configuration, and output directory.
+    """
     parser = ArgumentParser()
     parser.add_argument("--debug", action="store_true", help="Enable debug logging")
     parser.add_argument(

src/mosaic/inference/aeon.py CHANGED Viewed

@@ -1,3 +1,9 @@
 import pickle  # nosec
 import sys
 from argparse import ArgumentParser
@@ -16,6 +22,7 @@ from mosaic.inference.data import (
 from loguru import logger
 cancer_types_to_drop = [
     "UDMN",
     "ADNOS",
@@ -48,6 +55,21 @@ NUM_WORKERS = 8
 def run(
     features, model_path, metastatic=False, batch_size=8, num_workers=8, use_cpu=False
 ):
     device = torch.device(
         "cuda" if not use_cpu and torch.cuda.is_available() else "cpu"
     )

+"""Aeon model inference module for cancer subtype prediction.
+This module provides functionality to run the Aeon deep learning model
+for predicting cancer subtypes from H&E whole slide image features.
+"""
 import pickle  # nosec
 import sys
 from argparse import ArgumentParser
 from loguru import logger
+# Cancer types excluded from prediction (too broad or ambiguous)
 cancer_types_to_drop = [
     "UDMN",
     "ADNOS",
 def run(
     features, model_path, metastatic=False, batch_size=8, num_workers=8, use_cpu=False
 ):
+    """Run Aeon model inference for cancer subtype prediction.
+    Args:
+        features: NumPy array of tile features extracted from the WSI
+        model_path: Path to the pickled Aeon model file
+        metastatic: Whether the slide is from a metastatic site
+        batch_size: Batch size for inference
+        num_workers: Number of workers for data loading
+        use_cpu: Force CPU usage instead of GPU
+    Returns:
+        tuple: (results_df, part_embedding)
+            - results_df: DataFrame with cancer subtypes and confidence scores
+            - part_embedding: Torch tensor of the learned part representation
+    """
     device = torch.device(
         "cuda" if not use_cpu and torch.cuda.is_available() else "cpu"
     )

src/mosaic/inference/data.py CHANGED Viewed

@@ -1,3 +1,11 @@
 from enum import Enum
 from typing import List

+"""Data structures and utilities for inference modules.
+This module provides:
+- Cancer type to integer mappings for model inputs/outputs
+- SiteType enum for primary vs metastatic classification
+- TileFeatureTensorDataset for feeding features to PyTorch models
+"""
 from enum import Enum
 from typing import List

src/mosaic/inference/paladin.py CHANGED Viewed

@@ -1,3 +1,10 @@
 import csv
 import pickle  # nosec
 import sys
@@ -27,11 +34,16 @@ class UsageError(Exception):
 def load_model_map(model_map_path: str) -> dict[Any, Any]:
-    """Load the table mapping cancer_subtypes and targets to the paladin
-    model (a pickle file) that predicts that target for that cancer subtype.
     A dict is returned, mapping each cancer_subtype to a table mapping a
     target to the pathname for the model that predicts it.
     """
     models = defaultdict(dict)
     with Path(model_map_path).open() as fp:
@@ -45,10 +57,13 @@ def load_model_map(model_map_path: str) -> dict[Any, Any]:
 def load_aeon_scores(df: pd.DataFrame) -> dict[str, float]:
-    """Load the output table from a single-slide Aeon run, listing Oncotree
-    cancer subtypes and their confidence values.
-    A dict is returned, mapping each cancersubtype to its confidence score.
     """
     score = {}
     for _, row in df.iterrows():
@@ -59,7 +74,15 @@ def load_aeon_scores(df: pd.DataFrame) -> dict[str, float]:
 def select_cancer_subtypes(aeon_scores: dict[str, float], k=1) -> list[str]:
-    """Return the three top-scoring cancer_subtypes, based on the given Aeon scores."""
     sorted_cancer_subtypes = list(
         sorted([(v, k) for k, v in aeon_scores.items()], reverse=True)
     )
@@ -67,7 +90,15 @@ def select_cancer_subtypes(aeon_scores: dict[str, float], k=1) -> list[str]:
 def select_models(cancer_subtypes: list[str], model_map: dict[Any, Any]) -> list[Any]:
-    """ """
     models = []
     for cancer_subtype, target, model in model_map.items():
         if cancer_subtype in cancer_subtypes:
@@ -76,8 +107,17 @@ def select_models(cancer_subtypes: list[str], model_map: dict[Any, Any]) -> list
 def run_model(device, dataset, model_path: str, num_workers, batch_size) -> float:
-    """Run inference for the given embeddings and model.
-    The point estimate is returned.
     """
     logger.debug(f"[loading model {model_path}]")
@@ -108,6 +148,17 @@ def run_model(device, dataset, model_path: str, num_workers, batch_size) -> floa
 def logits_to_point_estimates(logits):
     # logits is a tensor of shape (batch_size, 2 * (n_clf_tasks + n_reg_tasks))
     # need to convert it to a tensor of shape (batch_size, n_clf_tasks + n_reg_tasks)
     return logits[:, ::2] / (logits[:, ::2] + logits[:, 1::2])
@@ -124,13 +175,28 @@ def run(
     num_workers: int = NUM_WORKERS,
     use_cpu: bool = False,
 ):
-    """Run Paladin inference on a single slide, using the given embeddings
-    and either a single model or a table mapping cancer_subtypes and targets to models.
-    If cancer_subtype_codes is given, it is a list of OncoTree codes for the slide.
-    If aeon_predictions_path is given, it is the pathname to a CSV file
-    with the output of an Aeon run on the slide.
-    If both are given, an error is raised.
-    The output is written to the given output_path (a CSV file).
     """
     if aeon_results is not None:

+"""Paladin model inference module for biomarker prediction.
+This module provides functionality to run the Paladin deep learning models
+for predicting various biomarkers from H&E whole slide image features, based
+on the predicted or known cancer subtype.
+"""
 import csv
 import pickle  # nosec
 import sys
 def load_model_map(model_map_path: str) -> dict[Any, Any]:
+    """Load the table mapping cancer subtypes and targets to Paladin models.
     A dict is returned, mapping each cancer_subtype to a table mapping a
     target to the pathname for the model that predicts it.
+    Args:
+        model_map_path: Path to the CSV file containing the model map
+    Returns:
+        Dictionary mapping cancer subtypes to their target-specific models
     """
     models = defaultdict(dict)
     with Path(model_map_path).open() as fp:
 def load_aeon_scores(df: pd.DataFrame) -> dict[str, float]:
+    """Load Aeon output table with cancer subtypes and confidence values.
+    Args:
+        df: DataFrame with columns 'Cancer Subtype' and 'Confidence'
+    Returns:
+        Dictionary mapping cancer subtypes to their confidence scores
     """
     score = {}
     for _, row in df.iterrows():
 def select_cancer_subtypes(aeon_scores: dict[str, float], k=1) -> list[str]:
+    """Select the top k cancer subtypes based on Aeon confidence scores.
+    Args:
+        aeon_scores: Dictionary mapping cancer subtypes to confidence scores
+        k: Number of top subtypes to select (default: 1)
+    Returns:
+        List of cancer subtype codes sorted by confidence (highest first)
+    """
     sorted_cancer_subtypes = list(
         sorted([(v, k) for k, v in aeon_scores.items()], reverse=True)
     )
 def select_models(cancer_subtypes: list[str], model_map: dict[Any, Any]) -> list[Any]:
+    """Select Paladin models for the given cancer subtypes.
+    Args:
+        cancer_subtypes: List of cancer subtype codes
+        model_map: Dictionary mapping cancer subtypes to their models
+    Returns:
+        List of tuples (cancer_subtype, target, model_path)
+    """
     models = []
     for cancer_subtype, target, model in model_map.items():
         if cancer_subtype in cancer_subtypes:
 def run_model(device, dataset, model_path: str, num_workers, batch_size) -> float:
+    """Run inference for the given dataset and Paladin model.
+    Args:
+        device: Torch device (CPU or CUDA)
+        dataset: TileFeatureTensorDataset containing the features
+        model_path: Path to the pickled Paladin model
+        num_workers: Number of workers for data loading
+        batch_size: Batch size for inference
+    Returns:
+        Point estimate (predicted value) from the model
     """
     logger.debug(f"[loading model {model_path}]")
 def logits_to_point_estimates(logits):
+    """Convert model logits to point estimates for beta-binomial distribution.
+    The logits tensor contains alpha and beta parameters interleaved.
+    This function computes the mean of the beta-binomial distribution: alpha/(alpha+beta).
+    Args:
+        logits: Tensor of shape (batch_size, 2*(n_tasks)) with alpha/beta parameters
+    Returns:
+        Tensor of shape (batch_size, n_tasks) with point estimates
+    """
     # logits is a tensor of shape (batch_size, 2 * (n_clf_tasks + n_reg_tasks))
     # need to convert it to a tensor of shape (batch_size, n_clf_tasks + n_reg_tasks)
     return logits[:, ::2] / (logits[:, ::2] + logits[:, 1::2])
     num_workers: int = NUM_WORKERS,
     use_cpu: bool = False,
 ):
+    """Run Paladin inference for biomarker prediction on a single slide.
+    Uses either Aeon predictions or user-provided cancer subtype codes to select
+    the appropriate Paladin models for biomarker prediction.
+    Args:
+        features: NumPy array of tile features extracted from the WSI
+        aeon_results: DataFrame with Aeon predictions (Cancer Subtype, Confidence)
+        cancer_subtype_codes: List of OncoTree codes if cancer subtype is known
+        model_map_path: Path to CSV file mapping subtypes/targets to model paths
+        model_path: Path to a single Paladin model (alternative to model_map_path)
+        metastatic: Whether the slide is from a metastatic site
+        batch_size: Batch size for inference
+        num_workers: Number of workers for data loading
+        use_cpu: Force CPU usage instead of GPU
+    Returns:
+        DataFrame with columns: Cancer Subtype, Target, Score
+    Note:
+        Either aeon_results or cancer_subtype_codes must be provided, but not both.
+        Either model_map_path or model_path must be provided, but not both.
     """
     if aeon_results is not None:

src/mosaic/ui/app.py CHANGED Viewed

@@ -1,3 +1,13 @@
 import gradio as gr
 import pandas as pd
 from pathlib import Path

+"""Gradio web interface for Mosaic.
+This module provides the web-based user interface for analyzing whole slide images.
+It includes functionality for:
+- Multi-slide upload and analysis
+- Settings configuration (site type, cancer subtype, IHC subtype, segmentation)
+- Results visualization and export
+- CSV-based batch processing
+"""
 import gradio as gr
 import pandas as pd
 from pathlib import Path

src/mosaic/ui/utils.py CHANGED Viewed

@@ -1,3 +1,12 @@
 import tempfile
 from pathlib import Path
 import pandas as pd
@@ -21,6 +30,17 @@ oncotree_code_map = {}
 def get_oncotree_code_name(code):
     global oncotree_code_map
     if code in oncotree_code_map.keys():
         return oncotree_code_map[code]
@@ -38,7 +58,15 @@ def get_oncotree_code_name(code):
 def create_user_directory(state, request: gr.Request):
-    """Create a unique directory for each user session."""
     session_hash = request.session_hash
     if session_hash is None:
         return None, None
@@ -49,7 +77,20 @@ def create_user_directory(state, request: gr.Request):
 def load_settings(slide_csv_path):
-    """Load settings from CSV file and validate columns."""
     settings_df = pd.read_csv(slide_csv_path, na_filter=False)
     if "Segmentation Config" not in settings_df.columns:
         settings_df["Segmentation Config"] = "Biopsy"
@@ -64,7 +105,24 @@ def load_settings(slide_csv_path):
 def validate_settings(settings_df, cancer_subtype_name_map, cancer_subtypes, reversed_cancer_subtype_name_map):
-    """Validate settings DataFrame and provide warnings for invalid entries."""
     settings_df.columns = SETTINGS_COLUMNS
     warnings = []
     for idx, row in settings_df.iterrows():
@@ -110,6 +168,17 @@ def validate_settings(settings_df, cancer_subtype_name_map, cancer_subtypes, rev
 def export_to_csv(df):
     if df is None or df.empty:
         raise gr.Error("No data to export.")
     csv_path = "paladin_results.csv"

+"""UI utility functions for the Mosaic Gradio interface.
+This module provides helper functions for:
+- OncoTree code lookup and caching
+- User session directory management
+- Settings CSV loading and validation
+- Data export functionality
+"""
 import tempfile
 from pathlib import Path
 import pandas as pd
 def get_oncotree_code_name(code):
+    """Retrieve the human-readable name for an OncoTree code.
+    Queries the OncoTree API to get the cancer subtype name corresponding
+    to the given code. Results are cached to avoid repeated API calls.
+    Args:
+        code: OncoTree code (e.g., "LUAD", "BRCA")
+    Returns:
+        Human-readable cancer subtype name, or "Unknown" if not found
+    """
     global oncotree_code_map
     if code in oncotree_code_map.keys():
         return oncotree_code_map[code]
 def create_user_directory(state, request: gr.Request):
+    """Create a unique directory for each user session.
+    Args:
+        state: Gradio state object (unused)
+        request: Gradio request object containing session hash
+    Returns:
+        Path to user's session directory, or None if no session hash available
+    """
     session_hash = request.session_hash
     if session_hash is None:
         return None, None
 def load_settings(slide_csv_path):
+    """Load slide analysis settings from CSV file.
+    Loads the CSV and ensures all required columns are present, adding defaults
+    for optional columns if they are missing.
+    Args:
+        slide_csv_path: Path to the CSV file containing slide settings
+    Returns:
+        DataFrame with columns: Slide, Site Type, Cancer Subtype, IHC Subtype, Segmentation Config
+    Raises:
+        ValueError: If required columns are missing from the CSV
+    """
     settings_df = pd.read_csv(slide_csv_path, na_filter=False)
     if "Segmentation Config" not in settings_df.columns:
         settings_df["Segmentation Config"] = "Biopsy"
 def validate_settings(settings_df, cancer_subtype_name_map, cancer_subtypes, reversed_cancer_subtype_name_map):
+    """Validate and normalize slide analysis settings.
+    Checks each row for valid values and normalizes cancer subtype names.
+    Generates warnings for invalid entries and replaces them with defaults.
+    Args:
+        settings_df: DataFrame with slide settings to validate
+        cancer_subtype_name_map: Dict mapping subtype display names to codes
+        cancer_subtypes: List of valid cancer subtype codes
+        reversed_cancer_subtype_name_map: Dict mapping codes to display names
+    Returns:
+        Validated DataFrame with normalized values
+    Note:
+        Invalid entries are replaced with defaults and warnings are displayed
+        to the user via Gradio warnings.
+    """
     settings_df.columns = SETTINGS_COLUMNS
     warnings = []
     for idx, row in settings_df.iterrows():
 def export_to_csv(df):
+    """Export a DataFrame to CSV file for download.
+    Args:
+        df: DataFrame to export
+    Returns:
+        Path to the exported CSV file
+    Raises:
+        gr.Error: If the DataFrame is None or empty
+    """
     if df is None or df.empty:
         raise gr.Error("No data to export.")
     csv_path = "paladin_results.csv"