mosaic-zero / README-docker.md
copilot-swe-agent[bot]
Improve CSV handling and add documentation notes
984090e
# Mosaic: Docker deployment
The Mosaic app been packaged as a Docker image, which may be easier to use than
installing the app in a Python environment.
## Table of Contents
- [Installation](#installation)
- [Usage](#usage)
### System requirements
Supported systems:
- Linux (x86) with GPU (NVIDIA CUDA)
### Pre-requisites
You will need to have Docker or Podman installed on your system, and at least 8G of
storage space for the docker image.
You will need to have the NVidia Container Toolkit installed on the machine where
you want to run the Mosaic app. For instructions, see
[Installing the NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
### Installation
1. Pull the image into your local Docker repository
```bash
docker pull docker.io/tomp/mosaic-gradio
```
2. Set the HF_TOKEN to your HuggingFace access token
The models for Mosaic are not yet public. To access the models, you need to be
a member of the [Pathology Data Mining Group](https://huggingface.co/PDM-Group)
organization on HuggingFace.
To download the models, you need to set the HF_TOKEN environment variable to your
HuggingFace access token.
If you don not already have one, create an access token by logging in to your account
of HuggingFace, and clicking on the user icon at the top right corner of the site and
selecting "Access Tokens". When creating the token, select all read options for your
private space and the PDM-Group space.
```bash
export HF_TOKEN="TOKEN-FROM-HUGGINGFACE"
```
## Usage
### Web Application
1. Start up the web app using the command
```bash
docker run -it \
--gpus=all --runtime=nvidia \
--env HF_TOKEN=${HF_TOKEN} \
--shm-size=500m \
-p 7860:7860 \
tomp/mosaic-gradio
```
2. Access the webapp at the URL [http://localhost:7860/](http://localhost:7860)
*. You can also start up the docker container using the `run_mosaic_docker.sh` script
in this repo. That executes the `docker run` command (shown above) for you, and lets
you specify the port you want to use to access the app (if 7860 is not available).
To run it, you would just execute
```bash
./run_mosaic_docker.sh
or
./run_mosaic_docker.sh --port 7863
```
### Command Line Interface (CLI)
For seamless CLI usage via Docker, use the provided `mosaic` wrapper script. This script
automatically handles volume mounting and passes all arguments to the containerized
Mosaic CLI.
#### Basic Usage
```bash
# Show help
./mosaic --help
# Process a single slide
./mosaic --slide-path /path/to/slide.svs \
--output-dir /path/to/output \
--site-type Primary \
--cancer-subtype Unknown \
--segmentation-config Resection
# Process multiple slides from a CSV file
./mosaic --slide-csv /path/to/slides.csv \
--output-dir /path/to/output
# Process a breast cancer slide with IHC subtype
./mosaic --slide-path /path/to/breast_slide.svs \
--output-dir /path/to/output \
--site-type Primary \
--cancer-subtype BRCA \
--ihc-subtype "HR+/HER2-"
```
#### How it works
The `mosaic` wrapper script:
- Automatically mounts input slide directories and output directories into the container
- Passes through all Mosaic CLI arguments
- Handles the HF_TOKEN environment variable
- Detects and uses GPU support if available (falls back to CPU if not)
**Note**: When using `--slide-csv`, the script mounts the directory containing the CSV file.
For slides referenced in the CSV, they should be in the same directory as the CSV file or in
subdirectories relative to it. If slides are in different locations, you may need to modify
the CSV to use relative paths or run the docker command directly with additional volume mounts.
#### Requirements for CLI usage
- Docker installed and running
- HF_TOKEN environment variable set (same as web app)
- NVIDIA Docker runtime for GPU support (optional, will run on CPU if not available)
### Notes
- After you start up the application, it will download the necessary models from
HuggingFace. This may take some time (up to a few minutes) depending on your
internet connection.