Clinical Context

Preoperative liver volumetry is a critical step in hepatic surgery planning. Accurate estimation of tumor burden and residual liver volume directly influences surgical feasibility and helps prevent postoperative liver failure.

Traditional segmentation methods are often manual or semi-automatic, leading to:

  • High time consumption
  • Strong dependency on clinical expertise
  • Limited reproducibility

This project proposes a fully automated pipeline that integrates:

  1. Liver and tumor segmentation from CT scans
  2. Quantitative volumetric computation
  3. Multimodal reasoning using MedGemma
  4. Automated structured clinical report generation

Pipeline Overview

The system transforms a raw CT image into a structured clinical interpretation through four main stages: Pipeline Overview

Model Description

This pipeline integrates convolutional neural networks for medical image segmentation with a quantized large language model for structured medical report generation.

The system combines:

  • A U-Net model for liver segmentation
  • A ResU-Net model for tumor segmentation
  • A quantized MedGemma 1.5-4B model for automated medical reasoning and report generation

After segmentation, binary prediction masks are used to compute:

  • Total liver volume
  • Tumor volume
  • Tumor-to-liver volume ratio

These quantitative results, along with segmentation summaries, are provided as structured input to MedGemma, which generates an automated clinical-style report. The original base model used for quantization is: MedGemma 1.5-4B (Google)
https://huggingface.co/google/medgemma-1.5-4b

Quantization

The MedGemma 1.5-4B model was quantized to 4-bit precision using the bitsandbytes library in order to:

  • Reduce GPU memory usage
  • Enable deployment on hardware with limited computational resources
  • Maintain acceptable performance while optimizing inference speed

Training Details

  • Dataset: Images from the public 3Dircadb dataset (3D Image Reconstruction for Comparison of Algorithm Database). The original CT volumes were converted into 2D slices and saved in JPEG format for training.
  • Input size: 256x256
  • Framework: TensorFlow (segmentation), Transformers (MedGemma)
  • Hardware: NVIDIA GPU

Multimodal Prompt Construction

The input prompt to MedGemma includes:

  • The image with segmentation overlay
  • Structured volumetric values:
    • Liver volume
    • Tumor volume
    • Tumor ratio This multimodal design allows MedGemma to contextualize quantitative metrics using visual evidence, simulating radiological reasoning.

Automated Clinical Report Generation

From the multimodal prompt, MedGemma generates:

  • Interpretation of tumor burden
  • Estimation of relative tumor size
  • Clinical severity insights
  • Decision-support suggestions (monitoring, surgery consideration)

⚠ This system is not intended to replace medical expertise but to assist rapid and standardized interpretation.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support