Clinical Context
Preoperative liver volumetry is a critical step in hepatic surgery planning. Accurate estimation of tumor burden and residual liver volume directly influences surgical feasibility and helps prevent postoperative liver failure.
Traditional segmentation methods are often manual or semi-automatic, leading to:
- High time consumption
- Strong dependency on clinical expertise
- Limited reproducibility
This project proposes a fully automated pipeline that integrates:
- Liver and tumor segmentation from CT scans
- Quantitative volumetric computation
- Multimodal reasoning using MedGemma
- Automated structured clinical report generation
Pipeline Overview
The system transforms a raw CT image into a structured clinical interpretation through four main stages:

Model Description
This pipeline integrates convolutional neural networks for medical image segmentation with a quantized large language model for structured medical report generation.
The system combines:
- A U-Net model for liver segmentation
- A ResU-Net model for tumor segmentation
- A quantized MedGemma 1.5-4B model for automated medical reasoning and report generation
After segmentation, binary prediction masks are used to compute:
- Total liver volume
- Tumor volume
- Tumor-to-liver volume ratio
These quantitative results, along with segmentation summaries, are provided as structured input to MedGemma, which generates an automated clinical-style report.
The original base model used for quantization is:
MedGemma 1.5-4B (Google)
https://huggingface.co/google/medgemma-1.5-4b
Quantization
The MedGemma 1.5-4B model was quantized to 4-bit precision using the bitsandbytes library in order to:
- Reduce GPU memory usage
- Enable deployment on hardware with limited computational resources
- Maintain acceptable performance while optimizing inference speed
Training Details
- Dataset: Images from the public 3Dircadb dataset (3D Image Reconstruction for Comparison of Algorithm Database). The original CT volumes were converted into 2D slices and saved in JPEG format for training.
- Input size: 256x256
- Framework: TensorFlow (segmentation), Transformers (MedGemma)
- Hardware: NVIDIA GPU
Multimodal Prompt Construction
The input prompt to MedGemma includes:
- The image with segmentation overlay
- Structured volumetric values:
- Liver volume
- Tumor volume
- Tumor ratio This multimodal design allows MedGemma to contextualize quantitative metrics using visual evidence, simulating radiological reasoning.
Automated Clinical Report Generation
From the multimodal prompt, MedGemma generates:
- Interpretation of tumor burden
- Estimation of relative tumor size
- Clinical severity insights
- Decision-support suggestions (monitoring, surgery consideration)
⚠ This system is not intended to replace medical expertise but to assist rapid and standardized interpretation.