Working on preprocess and predict container

Browse files

Files changed (10) hide show

README.md +45 -5
docker/README.md +11 -15
docker/predict.Dockerfile +17 -7
docker/preprocess.Dockerfile +1 -3
docker/preprocess_and_predict.Dockerfile +41 -0
pyment/__init__.py +15 -10
pyproject.toml +3 -0
scripts/predict_from_fastsurfer_folder.py +45 -9
scripts/preprocess.sh +11 -9
tutorials/evaluate_ixi_predictions.py +2 -3

README.md CHANGED Viewed

@@ -93,6 +93,7 @@ All the approaches described below rely on having the IXI dataset downloaded. If
 python tutorials/download_ixi.py
 ```
 ## Generate predictions
 <details>
 <summary> Preprocess and predict manually </summary>
@@ -101,11 +102,7 @@ Preprocessing and predicting manually relies on using the scripts provided in th
 ### Preprocessing
 The images must be preprocessed using FastSurfer. First, FastSurfer must be downloaded. If any of the subsequent steps fail, a comprehensive installation-guide can be found in the [FastSurfer GitHub repository](https://github.com/Deep-MI/FastSurfer/blob/dev/doc/overview/INSTALL.md#native-ubuntu-2004-or-ubuntu-2204). The following steps downloads and installs FastSurfer into the folder `~/repos/fastsurfer`. First, some system packages must be installed:
 ```
-sudo apt-get update && apt-get install -y --no-install-recommends \
-    wget \
-    git \
-    ca-certificates \
-    file
 ```
 Next, we can clone FastSurfer, and change to the correct branch:
 ```
@@ -141,3 +138,46 @@ mkdir ~/data/ixi/outputs
 python scripts/predict_from_fastsurfer_folder.py ~/data/ixi/preprocessed -d ~/data/ixi/outputs/predictions.csv
 ```
 </details>

 python tutorials/download_ixi.py
 ```
 ## Generate predictions
 <details>
 <summary> Preprocess and predict manually </summary>
 ### Preprocessing
 The images must be preprocessed using FastSurfer. First, FastSurfer must be downloaded. If any of the subsequent steps fail, a comprehensive installation-guide can be found in the [FastSurfer GitHub repository](https://github.com/Deep-MI/FastSurfer/blob/dev/doc/overview/INSTALL.md#native-ubuntu-2004-or-ubuntu-2204). The following steps downloads and installs FastSurfer into the folder `~/repos/fastsurfer`. First, some system packages must be installed:
 ```
+sudo apt-get update && apt-get install -y --no-install-recommends wget git ca-certificates file
 ```
 Next, we can clone FastSurfer, and change to the correct branch:
 ```
 python scripts/predict_from_fastsurfer_folder.py ~/data/ixi/preprocessed -d ~/data/ixi/outputs/predictions.csv
 ```
 </details>
+<details>
+<summary> Preprocess and predict in two steps via docker </summary>
+Preprocessing and predicting in two steps via docker requires using the two prebuilt docker containers for the two steps independently.
+### Preprocessing
+Running the container for preprocessing requires mounting three volumes:
+- Inputs: A folder containing input data. All nifti-files detected in this folder or one of its subfolders will be processed
+- Outputs: A folder where the preprocessed images will be written. This must be created prior to running the container
+- Licenses: A folder containing the freesurfer license. The file must be named freesurfer.txt
+```
+mkdir -p ~/data/ixi/outputs
+docker run --rm \
+    --user $(id -u):$(id -g) \
+    --volume $HOME/data/ixi/images:/input \
+    --volume $HOME/data/ixi/outputs:/output \
+    --volume <path_to_licenses>:/licenses \
+    --gpus all \
+    estenhl/pyment-preprocessing:1.0.0
+```
+### Generate predictions
+Running the container for predictions requires two volumes:
+- Fastsurfer: The folder containing fastsurfer-processed images
+- Outputs: The folder where the predictions are written
+```
+docker run --rm -it \
+    --user $(id -u):$(id -g) \
+    --volume $HOME/data/ixi/outputs/fastsurfer:/fastsurfer \
+    --volume $HOME/data/ixi/outputs:/output \
+    --gpus all \
+    estenhl/pyment-predict:1.0.0
+```
+</details>
+## Evaluate predictions
+Evaluate the IXI predictions with
+```
+python tutorials/evaluate_ixi_predictions.py
+```
+If everything is set up correctly, this should yield an MAE of 3.12. Note that the paths to both the labels and predictions can be given as keyword arguments to the script if they don't reside in the standard locations.

docker/README.md CHANGED Viewed

@@ -1,24 +1,20 @@
-## Build docker container for preprocessing
-Note that for now, building the container requires a folder called <checkpoints> that contains the FastSurfer segmentation checkpoints
 ```
 docker build \
     -f docker/preprocess.Dockerfile \
-    -t pyment/preprocessing:1.0.0 \
     .
 ```
-## Run docker container for preprocessing
-Running the container for preprocessing requires three volumes:
-- Inputs: A folder containing input data. All nifti-files detected in this folder or one of its subfolders will be processed
-- Outputs: A folder where the preprocessed images will be written.
-- Licenses: A folder containing the freesurfer license
 ```
-docker run --rm \
-    --user $(id -u):$(id -g) \
-    --volume <path_to_input>:/input \
-    --volume <path_to_ouput>:/output \
-    --volume <path_to_licenses>:/licenses \
-    --gpus all \
-    pyment/preprocessing:1.0.0
 ```

+# Building docker containers
+## Building docker container for preprocessing
+Note that for now, building the container requires a folder called <checkpoints> that contains the FastSurfer segmentation checkpoints in a subfolder called `fastsurfer`. This folder should contain the files `aparc_vinn_axial_v2.0.0.pkl`, `aparc_vinn_coronal_v2.0.0.pkl`, and `aparc_vinn_sagittal_v2.0.0.pkl`. The command should be run from the root of the repository:
 ```
 docker build \
     -f docker/preprocess.Dockerfile \
+    -t estenhl/pyment-preprocessing:1.0.0 \
     .
 ```
+## Building docker container for preprocessing
+Note that for now, building the container requires a folder called <checkpoints> that contains the multi-task model checkpoints in a subfolder called `pyment`. This folder should contain the files `sfcn-multi.data-00000-of-00001`and `sfcn-multi.index`. The command should be run from the root of the repository:
 ```
+docker build \
+    -f docker/predict.Dockerfile \
+    -t estenhl/pyment-predict:1.0.0 \
+    .
 ```

docker/predict.Dockerfile CHANGED Viewed

@@ -1,11 +1,21 @@
-FROM estenhl/pyment-preprocessing:1.0.0
-RUN python -m venv /envs/pyment
-RUN mkdir /repos/pyment
-COPY . /repos/pyment
-RUN cd /repos/pyment && \
-    /envs/pyment/bin/pip install --upgrade pip && \
-    /envs/pyment/bin/pip install .

+FROM python:3.10.4-slim
+RUN mkdir -p /repos/pyment
+COPY scripts /repos/pyment/scripts
+COPY pyment /repos/pyment/pyment
+COPY pyproject.toml /repos/pyment/
+COPY README.md /repos/pyment/
+COPY LICENSE.md /repos/pyment/
+RUN pip install --upgrade pip poetry-core build && \
+    cd /repos/pyment && \
+    pip install --no-cache-dir .
+RUN mkdir -p /.pyment/weights && \
+    chmod -R 1777 /.pyment
+COPY checkpoints/pyment /.pyment/weights
+CMD ["python", "/repos/pyment/scripts/predict_from_fastsurfer_folder.py", \
+     "/fastsurfer", \
+     "-d", "/output/predictions.csv"]

docker/preprocess.Dockerfile CHANGED Viewed

@@ -1,7 +1,5 @@
 FROM python:3.10.2-slim
-#ARG CHECKPOINTS_FOLDER
 RUN apt-get update && apt-get install -y \
     apt-utils git \
     && rm -rf /var/lib/apt/lists/*
@@ -23,7 +21,7 @@ RUN /envs/fastsurfer/bin/pip install --upgrade pip && \
     /envs/fastsurfer/bin/pip install -r ${FASTSURFER_HOME}/requirements.txt
 #COPY ${CHECKPOINTS_FOLDER} ${FASTSURFER_HOME}/FastSurferCNN/checkpoints
-COPY checkpoints ${FASTSURFER_HOME}/FastSurferCNN/checkpoints
 RUN mkdir /scripts
 COPY scripts/preprocess.sh /scripts/preprocess.sh

 FROM python:3.10.2-slim
 RUN apt-get update && apt-get install -y \
     apt-utils git \
     && rm -rf /var/lib/apt/lists/*
     /envs/fastsurfer/bin/pip install -r ${FASTSURFER_HOME}/requirements.txt
 #COPY ${CHECKPOINTS_FOLDER} ${FASTSURFER_HOME}/FastSurferCNN/checkpoints
+COPY checkpoints/fastsurfer ${FASTSURFER_HOME}/FastSurferCNN/checkpoints
 RUN mkdir /scripts
 COPY scripts/preprocess.sh /scripts/preprocess.sh

docker/preprocess_and_predict.Dockerfile ADDED Viewed

	@@ -0,0 +1,41 @@

+FROM estenhl/pyment-preprocessing:1.0.0
+RUN apt-get update && apt-get install -y \
+    make build-essential libssl-dev zlib1g-dev \
+    libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
+    libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev \
+    libffi-dev liblzma-dev git \
+    && rm -rf /var/lib/apt/lists/*
+ENV PYENV_ROOT=/root/.pyenv
+ENV PATH="$PYENV_ROOT/bin:$PATH"
+RUN curl https://pyenv.run | bash && \
+    echo 'eval "$(pyenv init -)"' >> ~/.bashrc
+RUN eval "$(pyenv init -)" && \
+    pyenv install 3.10.4
+RUN mkdir -p /envs && \
+    $PYENV_ROOT/versions/3.10.4/bin/python -m venv /envs/pyment
+RUN mkdir -p /repos/pyment
+COPY scripts /repos/pyment/scripts
+COPY pyment /repos/pyment/pyment
+COPY pyproject.toml /repos/pyment/
+COPY README.md /repos/pyment/
+COPY LICENSE.md /repos/pyment/
+RUN /envs/pyment/bin/pip install --upgrade pip poetry-core build && \
+    cd /repos/pyment && \
+    /envs/pyment/bin/pip install --no-cache-dir .
+CMD ["/bin/sh", "-c", \
+    "/scripts/preprocess.sh \
+        --license /licenses/freesurfer.txt \
+        --python /envs/fastsurfer/bin/python \
+        /inputs \
+        /outputs/fastsurfer \
+    && /envs/pyment/bin/python /repos/pyment/scripts/predict_from_fastsurfer_folder.py \
+        /outputs/fastsurfer \
+        -d /outputs/predictions.csv"]

pyment/__init__.py CHANGED Viewed

@@ -1,15 +1,20 @@
-import os
-import tomli
 def _get_version():
-    """Get version from pyproject.toml"""
-    pyproject_path = os.path.join(
-        os.path.dirname(__file__), os.pardir, 'pyproject.toml'
-    )
-    with open(pyproject_path, 'rb') as f:
-        data = tomli.load(f)
-    return data['project']['version']
 __version__ = _get_version()

 def _get_version():
+    """Get version from package metadata (generated from pyproject.toml during installation)"""
+    try:
+        from importlib.metadata import version, PackageNotFoundError
+        return version('pyment')
+    except PackageNotFoundError:
+        import os
+        import tomli
+        pyproject_path = os.path.join(
+            os.path.dirname(__file__), os.pardir, 'pyproject.toml'
+        )
+        if os.path.exists(pyproject_path):
+            with open(pyproject_path, 'rb') as f:
+                data = tomli.load(f)
+            return data['project']['version']
 __version__ = _get_version()

pyproject.toml CHANGED Viewed

@@ -14,6 +14,9 @@ requires-python = "==3.10.4"
 [tool.poetry]
 packages = [{include = "pyment"}]
 [tool.poetry.dependencies]
 python = "3.10.4"

 [tool.poetry]
 packages = [{include = "pyment"}]
+include = [
+    {path = "pyproject.toml", format = "sdist"}
+]
 [tool.poetry.dependencies]
 python = "3.10.4"

scripts/predict_from_fastsurfer_folder.py CHANGED Viewed

@@ -12,7 +12,10 @@ import nibabel as nib
 from pyment.models.sfcn import sfcn_factory
 from pyment.preprocessing.conform import conform
 logger = logging.getLogger(__name__)
 def _parse_folder_name(name: str) -> Tuple[str, str, str]:
@@ -25,7 +28,8 @@ def _parse_folder_name(name: str) -> Tuple[str, str, str]:
 def predict_from_fastsurfer_folder(
     source: str,
-    weights: str,
     model_name: str = 'sfcn-multi',
     targets: List[str] = [
         'age', 'sex', 'handedness', 'bmi', 'fluid_intelligence', 'neuroticism'
@@ -42,13 +46,26 @@ def predict_from_fastsurfer_folder(
     results = []
-    for folder in tqdm(os.listdir(source)):
         orig = os.path.join(source, folder, 'mri', 'orig.mgz')
         subject, session, run = _parse_folder_name(folder)
         if not os.path.isfile(orig):
-            logger.warning('No orig.mgz file for folder %s', folder)
             continue
         orig = nib.load(orig)
@@ -56,10 +73,18 @@ def predict_from_fastsurfer_folder(
         brainmask = os.path.join(source, folder, 'mri', 'mask.mgz')
         if not os.path.isfile(brainmask):
-            logger.warning('No mask.mgz file for folder %s', folder)
             continue
-        brainmask = nib.load(brainmask)
         brainmask = brainmask.get_fdata()
         image = orig * brainmask
@@ -109,8 +134,9 @@ if __name__ == '__main__':
         default='multi-2025',
         help=(
             'Weights to use. Should either point to a local file path, or a '
-            'known identifier. If a local file path <path> is used, there should '
-            'exist files named <path>.index and <path>.data-00000-of-00001'
         )
     )
     parser.add_argument(
@@ -131,6 +157,15 @@ if __name__ == '__main__':
         ],
         help='Name to use for each of the prediction heads in the output CSV'
     )
     parser.add_argument(
         '-d', '--destination',
         required=False,
@@ -142,9 +177,10 @@ if __name__ == '__main__':
     predict_from_fastsurfer_folder(
         source=args.root,
         model_name=args.model,
         weights=args.weights,
         targets=args.targets,
-        destination=args.destination
     )

 from pyment.models.sfcn import sfcn_factory
 from pyment.preprocessing.conform import conform
+logging.basicConfig(
+    format='%(asctime)s - %(levelname)s - %(name)s: %(message)s',
+    level=logging.DEBUG
+)
 logger = logging.getLogger(__name__)
 def _parse_folder_name(name: str) -> Tuple[str, str, str]:
 def predict_from_fastsurfer_folder(
     source: str,
+    folders: List[str] = None,
+    weights: str = None,
     model_name: str = 'sfcn-multi',
     targets: List[str] = [
         'age', 'sex', 'handedness', 'bmi', 'fluid_intelligence', 'neuroticism'
     results = []
+    logger.info(f'Reading fastsurfer folders from {source}')
+    folders = (
+        folders if folders is not None
+        else [
+            folder for folder in os.listdir(source)
+            if os.path.isdir(os.path.join(source, folder))
+        ]
+    )
+    for folder in tqdm(folders):
         orig = os.path.join(source, folder, 'mri', 'orig.mgz')
         subject, session, run = _parse_folder_name(folder)
         if not os.path.isfile(orig):
+            logger.warning(
+                'No orig.mgz file for folder %s',
+                os.path.join(source, folder)
+            )
             continue
         orig = nib.load(orig)
         brainmask = os.path.join(source, folder, 'mri', 'mask.mgz')
         if not os.path.isfile(brainmask):
+            logger.warning(
+                'No mask.mgz file for folder %s',
+                os.path.join(source, folder)
+            )
+            continue
+        try:
+            brainmask = nib.load(brainmask)
+        except Exception as e:
+            logger.error('Error loading brainmask for folder %s: %s', folder, e)
             continue
         brainmask = brainmask.get_fdata()
         image = orig * brainmask
         default='multi-2025',
         help=(
             'Weights to use. Should either point to a local file path, or a '
+            'known identifier. If a local file path <path> is used, there '
+            'should exist files named <path>.index and '
+            '<path>.data-00000-of-00001'
         )
     )
     parser.add_argument(
         ],
         help='Name to use for each of the prediction heads in the output CSV'
     )
+    parser.add_argument(
+        '-f', '--folders',
+        default=None,
+        nargs='+',
+        help=(
+            'List of folders to process. If not provided, all folders in '
+            'the source folder will be processed.'
+        )
+    )
     parser.add_argument(
         '-d', '--destination',
         required=False,
     predict_from_fastsurfer_folder(
         source=args.root,
+        folders=args.folders,
         model_name=args.model,
         weights=args.weights,
         targets=args.targets,
+        destination=args.destination,
     )

scripts/preprocess.sh CHANGED Viewed

@@ -52,7 +52,6 @@ if [ -z "$INPUT" ] || [ -z "$OUTPUT" ]; then
   exit 1
 fi
-# Validate that license is provided
 if [ -z "$LICENSE" ]; then
   echo "Error: License is required"
   usage
@@ -86,14 +85,17 @@ fi
 # Loop through each NIFTI file path
 echo "$NIFTI_FILES" | while IFS= read -r filepath; do
-  # Extract filename without .nii.gz suffix
   IMAGE=$(basename "$filepath" .nii.gz)
-  $FASTSURFER_HOME/run_fastsurfer.sh \
-    --sd $OUTPUT \
-    --sid $IMAGE \
-    --t1 $filepath \
-    --fs_license $LICENSE \
-    --py $PYTHON \
-    --seg_only
 done

   exit 1
 fi
 if [ -z "$LICENSE" ]; then
   echo "Error: License is required"
   usage
 # Loop through each NIFTI file path
 echo "$NIFTI_FILES" | while IFS= read -r filepath; do
   IMAGE=$(basename "$filepath" .nii.gz)
+  if [ ! -f "$OUTPUT/$IMAGE/mri/mask.mgz" ]; then
+    $FASTSURFER_HOME/run_fastsurfer.sh \
+      --sd $OUTPUT \
+      --sid $IMAGE \
+      --t1 $filepath \
+      --fs_license $LICENSE \
+      --py $PYTHON \
+      --seg_only
+  else
+    echo "$IMAGE already processed"
+  fi
 done

tutorials/evaluate_ixi_predictions.py CHANGED Viewed

@@ -13,9 +13,8 @@ def evaluate_ixi_predictions(
     predictions: str
 ) -> None:
     labels = pd.read_excel(labels)
-    print(labels.head())
     predictions = pd.read_csv(predictions)
-    print(predictions.head())
     predictions['IXI_ID'] = predictions['source'].apply(
         lambda path: int(path.split('/')[-1][3:6])
     )
@@ -28,7 +27,7 @@ def evaluate_ixi_predictions(
     )
     mae = np.mean(np.abs(predictions['AGE'] - predictions['age_prediction']))
-    print(f'MAE: {mae}')
     plt.scatter(predictions['AGE'], predictions['age_prediction'])
     plt.xlabel('True age')

     predictions: str
 ) -> None:
     labels = pd.read_excel(labels)
     predictions = pd.read_csv(predictions)
     predictions['IXI_ID'] = predictions['source'].apply(
         lambda path: int(path.split('/')[-1][3:6])
     )
     )
     mae = np.mean(np.abs(predictions['AGE'] - predictions['age_prediction']))
+    print(f'MAE: {mae:.2f}')
     plt.scatter(predictions['AGE'], predictions['age_prediction'])
     plt.xlabel('True age')