ius / README.md
pgatoula's picture
Sync from GitHub via hub-sync
dccce10 verified
---
license: mit
library_name: pytorch
tags:
- medical-imaging
- synthetic-data
- image-utility
- interpretability
- pytorch
- computer-vision
- research-code
pipeline_tag: image-classification
---
# Interpretable Similarity of Synthetic Image Utility (IUS)
[![GitHub stars](https://img.shields.io/github/stars/innoisys/ius.svg?style=flat&label=Star)](https://github.com/innoisys/ius/)
[![Readme](https://img.shields.io/badge/README-green.svg)](README.md)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
This repository currently contains the official **PyTorch** implementation of the *IUS* measure, introduced in [**"Interpretable Similarity of Synthetic Image Utility"**](https://ieeexplore.ieee.org/document/11458792) , published in **IEEE Transactions on Medical Imaging (TMI)**.
> **Status:** This repository currently contains the official PyTorch code for IUS.
>
> Pretrained checkpoints are not included yet and will be released separately.
## Overview
**IUS** (*Interpretable Utility Similarity*) is an interpretable measure for assessing the utility of synthetic medical image datasets for downstream clinical decision support (CDS) tasks, built upon the **EPU-CNN** framework introduced in [*E pluribus unum interpretable convolutional neural networks*](https://www.nature.com/articles/s41598-023-38459-1) and available in our [previous code repository](https://github.com/innoisys/epu-cnn-torch).
## Table of Contents
- [Features](#features)
- [Installation](#installation)
- [Project Structure](#project-structure)
- [Usage](#usage)
- [1. Prepare Configuration](#1-prepare-configuration)
- [2. Supported Dataset Structures for EPU-CNN Training](#2-supported-dataset-structures-for-epu-cnn-training)
- [2.1 Filename-based Structure](#21-filename-based-structure)
- [2.2 Folder-based Structure](#22-folder-based-structure)
- [2.3 MedMNIST Benchmark Structure](#23-medmnist-benchmark-structure)
- [3. EPU-CNN Training](#3-epu-cnn-training)
- [4. EPU-CNN Evaluation](#4-epu-cnn-evaluation)
- [5. Feature contribution profile estimation](#5-feature-contribution-profile-estimation)
- [6. Synthetic image utility evaluation with IUS](#6-synthetic-image-utility-evaluation-with-ius)
- [Paper Abstract](#-paper-abstract)
- [Citation](#citation)
- [Contact](#contact)
- [License](#license)
## Features
- YAML-based configuration for reproducible training, inference, and evaluation
- Flexible support for multiple training dataset structures
- Integrated **TensorBoard** logging for experiment monitoring
- Complete implementation of the **IUS** pipeline:
1. **EPU-CNN training** and classification performance evaluation
2. **Baseline feature contribution profile** estimation
3. **Synthetic dataset or singleton evaluation using IUS**
- Support for both **color** and **grayscale** medical imaging modalities
- Reproducible experiment management through structured saving of checkpoints, logs, timestamped results, and evaluation outputs
## Installation
Clone the repository:
```bash
git clone https://github.com/innoisys/ius.git
cd ius
```
Install dependencies:
```bash
pip install -r requirements.txt
```
## Project Structure
```
ius/
β”œβ”€β”€ configs/ # YAML configuration files
β”œβ”€β”€ data/ # Data loading and preprocessing
β”‚ β”œβ”€β”€ data_utils.py # Common utilities for image/data handling
β”‚ β”œβ”€β”€ dataloader.py # Dataloader definitions
β”‚ β”œβ”€β”€ dataset.py # Dataset implementations
β”‚ β”œβ”€β”€ loading.py # Dataset setup from YAML configuration
β”‚ β”œβ”€β”€ parsers.py # Data parser implementations
β”‚ └── perceptual_transforms.py # PFM (Perceptual Feature Map) generation
β”œβ”€β”€ datasets/ # Real datasets for EPU-CNN training & baseline feature contribution profile estimation
β”œβ”€β”€ datasets_synthetic/ # Synthetic data for IUS evaluation
β”œβ”€β”€ ius/ # IUS implementation
β”‚ β”œβ”€β”€ ius.py # IUS measure class
β”‚ └── ius_eval_parser.py # Suggested synthetic data parser (not requiring label information)
β”œβ”€β”€ model/ # EPU-CNN model implementation
β”‚ β”œβ”€β”€ epu.py # Main EPU-CNN model definition
β”‚ β”œβ”€β”€ module_mapping.py # Mappings from YAML config names to torch.nn layers/activations
β”‚ β”œβ”€β”€ register_modules.py # Registry for configurable model components
β”‚ β”œβ”€β”€ subnetworks.py # Subnetwork implementation
β”‚ └── subnetwork_utilities.py # Subnetwork helper modules
β”œβ”€β”€ results/ # Training and inference outputs
β”‚ β”œβ”€β”€ cb_vectors.py # Saved baseline feature contribution profiles (from infer_cb_vector.py)
β”‚ β”œβ”€β”€ checkpoints.py # Saved EPU-CNN checkpoints and training configurations (from train_epu.py)
β”‚ β”œβ”€β”€ classification_performance.py # Classification performance results (from infer_epu.py)
β”‚ β”œβ”€β”€ ius_eval.py # IUS evaluation results (from eval_ius.py)
β”‚ └── logs.py # TensorBoard logs (from train_epu.py )
β”œβ”€β”€ scripts/ # Training and inference scripts
β”‚ β”œβ”€β”€ eval_ius.py # Runs synthetic data evaluation with IUS
β”‚ β”œβ”€β”€ infer_cb_vector.py # Estimates baseline feature contribution profiles
β”‚ β”œβ”€β”€ infer_epu.py # Runs EPU-CNN inference/evaluation
β”‚ └── train_epu.py # Trains EPU-CNN models
└── utils/ # Utility functions
β”œβ”€β”€ callbacks.py # Training callbacks
β”œβ”€β”€ config_utils.py # YAML/configuration utilities
β”œβ”€β”€ early_stopping.py # Early stopping logic
β”œβ”€β”€ eval_utils.py # Utilities for EPU-CNN evaluation scripts
β”œβ”€β”€ metrics.py # Classification performance metrics
β”œβ”€β”€ omega_parser.py # OmegaConf-based configuration parser
β”œβ”€β”€ sanity_utils.py # Configuration validation and sanity checks
β”œβ”€β”€ tensorboard.py # Tensorboard utilities
β”œβ”€β”€ train_utils.py # Training setup and helping utilities
└── trainer.py # Main training loop implementation
```
## Usage
### 1. Prepare Configuration
Create a YAML configuration file in `configs/` with the following structure:
```yaml
model:
num_subnetworks: 4 # set to 4, corresponds to number of perceptual feature maps,
num_classes: 1
epu_activation: "sigmoid"
subnetwork_config:
architecture: "base_one" # default ius backbone
input_channels: 1 # number of channels in perceptual feature decomposition, set to 1
base_channels: 32
fc_hidden_units: 64
pred_activation: "tanh"
data_params:
dataset_path: "../datasets/dataset_name"
images_extension: "jpg"
data_loading:
batch_size: 64
shuffle: true
num_workers: 0
pin_memory: false
persistent_workers: false
data_preprocessing:
data_mode: "rgb" # "rgb" or "grayscale"
data_parser: "filename" # "filename" or "folder" or "medmnist"
resize_dims: [128, 128]
medmnist_csv_file: None
label_mapping:
abnormal: 1
normal: 0
train_params:
mode: "binary"
loss: "binary_cross_entropy"
epochs: 200
optimizer: "sgd"
learning_rate: 0.001
momentum: 0.9
weight_decay: 0.001
early_stopping_patience: 30
early_stopping_monitor: "val_loss" # "val_loss" or "val_metrics.auc"
early_stopping_mode: "min" # "min" or "max"
log_dir: "../results/logs" # default parent path
checkpoint_dir: "../results/checkpoints" # default parent path
experiment_name: "ius_dataset_name" # desired experiment path
```
**Key Points:**
- For binary classification: Use `n_classes: 1`, `epu_activation: "sigmoid"`
- Adjust `input_size` and `batch_size` based on your GPU memory
- Set `label_mapping` according to your dataset classes
### 2. Supported Dataset Structures for EPU-CNN Training
EPU-CNN training supports multiple dataset organization patterns. Complete examples are provided below:
#### 2.1 Filename-based Structure
```
datasets/
β”œβ”€β”€ dataset_name/
β”œβ”€β”€ train/
β”‚ β”œβ”€β”€ abnormal_001.jpg
β”‚ β”œβ”€β”€ abnormal_002.jpg
β”‚ β”œβ”€β”€ normal_001.jpg
β”‚ β”œβ”€β”€ normal_002.jpg
β”œβ”€β”€ validation/
β”‚ β”œβ”€β”€ abnormal_003.jpg
β”‚ β”œβ”€β”€ normal_003.jpg
└── test/
β”œβ”€β”€ abnormal_004.jpg
└── normal_004.jpg
```
**Configuration Example:**
```yaml
data_params:
dataset_path: "../datasets/dataset_name"
images_extension: "jpg"
data_preprocessing:
data_mode: "rgb" # "rgb" or "grayscale"
data_parser: "filename"
resize_dims: [128, 128]
medmnist_csv_file: None
label_mapping:
abnormal: 1
normal: 0
```
**Key Points:**
- Requires a consistent organization across train/validation/test splits
- The validation folder name must be exactly "validation" (not "val" or other variants)
- Supported image formats include jpg, jpeg, png, etc, supported by the parser
- Class names defined in label_mapping must appear in filenames
#### 2.2 Folder-based Structure
```
datasets/
β”œβ”€β”€ dataset_name/
β”œβ”€β”€ train/
β”‚ β”œβ”€β”€ abnormal
β”‚ β”‚ β”œβ”€β”€ image_001.jpg
β”‚ β”‚ └── image_002.jpg
β”‚ └── normal
β”‚ β”œβ”€β”€ image_001.jpg
β”‚ └── image_002.jpg
β”œβ”€β”€ validation/
β”‚ β”œβ”€β”€ abnormal
β”‚ β”‚ └── image_003.jpg
β”‚ └── normal
β”‚ └── image_003.jpg
└── test/
β”œβ”€β”€ abnormal
β”‚ └── image_004.jpg
└── normal
└── image_004.jpg
```
**Configuration Example:**
```yaml
data_params:
dataset_path: "../datasets/dataset_name"
images_extension: "jpg"
data_preprocessing:
data_mode: "rgb" # "rgb" or "grayscale"
data_parser: "folder"
resize_dims: [128, 128]
medmnist_csv_file: None
label_mapping:
abnormal: 1
normal: 0
```
**Key Points:**
- Requires a consistent organization across train/validation/test splits
- The validation folder name must be exactly "validation" (not "val" or other variants)
- Supported image formats include jpg, jpeg, png, etc. supported by the parser
- Class names defined in label_mapping must match the class folder names
#### 2.3 MedMNIST Benchmark Structure
```
datasets/
β”œβ”€β”€ pneumoniamnist/
β”‚ β”œβ”€β”€ pneumoniamnist.csv
β”‚ β”œβ”€β”€ test_0_0.png
β”‚ β”œβ”€β”€ test_1_1.png
β”‚ β”œβ”€β”€ train_0_1.png
β”‚ β”œβ”€β”€ train_1_0.png
β”‚ β”œβ”€β”€ train_2_0.png
β”‚ β”œβ”€β”€ train_3_1.png
β”‚ β”œβ”€β”€ val_0_1.png
β”‚ └── val_1_0.png
```
**Configuration Example:**
```yaml
data_params:
dataset_path: "../datasets/pneumoniamnist"
images_extension: "png"
data_preprocessing:
data_mode: "rgb" # "rgb" or "grayscale"
data_parser: "folder"
resize_dims: [128, 128]
medmnist_csv_file: "../datasets/pneumoniamnist/pneumoniamnist.csv"
label_mapping:
pneumonia: 1
normal: 0
```
**Key Points:**
- Supports any 2D dataset included in the MedMNIST Benchmark collection.
- Requires the data in the original format provided by the [MedMNIST benchmark](https://github.com/MedMNIST/MedMNIST/tree/main)
- The CSV file should be used as downloaded, without modification.
- Class names defined in label_mapping must match the official class names for the corresponding dataset provided in [info](https://github.com/MedMNIST/MedMNIST/blob/main/medmnist/info.py)
### 3. EPU-CNN Training
To train EPU-CNN, run one of the following commands after setting up a `config.yaml` file:
```bash
# Basic training
python scripts/train_epu.py --config_filepath configs/train_config.yaml
# Training with TensorBoard monitoring
python scripts/train_epu.py --config_filepath configs/train_config.yaml --tensorboard
```
The script saves trained model checkpoints and YAML training config in results/checkpoints under automatically generated name as:
{experiment_name}_{subnetwork_backbone}_{run_id}_{timestamp}
When using the `--tensorboard` flag, the script automatically:
- launches TensorBoard as a subprocess
- sets up monitoring for the `logs` directory
- makes TensorBoard available at `http://localhost:6006`
- enables real-time monitoring of training metrics and model graphs.
### 4. EPU-CNN Evaluation
To assess the classification performance of a trained EPU-CNN model run:
```bash
python scripts/infer_epu.py --experiment_folder_name ius_dataset_name_base_one_0000_timestamp
```
- Argument --experiment_folder_name contains the saved checkpoint and saved YAML config used during the EPU-CNN training.
- The classification performance report is saved in JSON format under results/classification_performance/{experiment_folder_name}
### 5. Feature contribution profile estimation
For each trained EPU-CNN model instance, the baseline feature contribution profile must be estimated only once using ether:
```bash
# Estimates all baseline feature contribution profiles
python scripts/infer_cb_vector.py --experiment_folder_name ius_dataset_name_base_one_0000_timestamp
# Estimates the baseline feature contribution profile of a single class
python scripts/infer_cb_vector.py --experiment_folder_name ius_dataset_name_base_one_0000_timestamp --cb_data normal
```
- The --experiment_folder_name contains the saved checkpoint and saved YAML config used during the EPU-CNN training.
- If argument --cb_data is not provided, the script estimates the baseline feature contribution profile for each class defined in saved config YAML (automatically retrieved from results/checkpoints based on --experiment_folder_name)
### 6. Synthetic image utility evaluation with IUS
To evaluate the utility of synthetic images using IUS, run:
```bash
# For synthetic dataset IUS evaluation
python scripts/eval_ius.py --experiment_folder_name ius_dataset_name_base_one_0000_timestamp --cb_vector_tag normal --synthetic_images datasets_synthetic/dataset_name/normal --synthetic_img_extension png
# For a single image IUS evaluation
python scripts/eval_ius.py --experiment_folder_name ius_dataset_name_base_one_0000_timestamp --cb_vector_tag normal --synthetic_images datasets_synthetic/dataset_name/normal/seed_000.png --synthetic_img_extension png
```
Both commands produce two outputs saved under results/ius_eval/{experiment_folder_name}
1. A JSON report containing information about IUS evaluation.
2. A CSV file containing the estimated IUS score for each synthetic image.
**Key Points:**
- Synthetic data should be stored under datasets_synthetic
- No specific internal folder structure is required; only a valid path to a folder of images is needed.
- A suggested organization example is provided below
- Supported image formats include jpg, jpeg, png, etc, supported by the parser
- Class labels are neither required nor used during IUS evaluation
Suggested synthetic data structure
```
datasets_synthetic/
β”œβ”€β”€ dataset_name/
β”œβ”€β”€ normal/
β”œβ”€β”€ seed_000.png
└── seed_001.png
```
## πŸ“„ Paper Abstract
<p align="justify">
Synthetic medical image data can unlock the potential of deep learning (DL)-based clinical decision support (CDS) systems through the creation of large scale, privacy-preserving, training sets. Despite the significant progress in this field, there is still a largely unanswered research question: β€œHow can we quantitatively assess the similarity of a synthetically generated set of images with a set of real images in a given application domain?”. Today, answers to this question are mainly provided via user evaluation studies, inception-based measures, and the classification performance achieved on synthetic images. This paper proposes a novel measure to assess the similarity between synthetically generated and real sets of images, in terms of their utility for the development of DL based CDS systems. Inspired by generalized neural additive models, and unlike inception-based measures, the proposed measure is interpretable (Interpretable Utility Similarity, IUS), explaining why a synthetic dataset could be more useful than another one in the context of a CDS system based on clinically relevant image features. The experimental results on publicly available benchmark datasets from various color medical imaging modalities including endoscopic, dermoscopic and fundus imaging, indicate that selecting synthetic images of high utility similarity using IUS can result in relative improvements of up to 54.6% in terms of classification performance. The generality of IUS for synthetic data assessment is demonstrated also for grayscale X-ray and ultrasound imaging modalities. IUS implementation is available at https://github.com/innoisys/ius.
</p>
## Citation
If you use this code or find our work useful in your research, please cite:
**APA**
P. Gatoula, G. Dimas and D. K. Iakovidis, "Interpretable Similarity of Synthetic Image Utility," in *IEEE Transactions on Medical Imaging*, doi: 10.1109/TMI.2026.3679527.
**BibTeX**
```bibtex
@article{
author = {Panagiota Gatoula and George Dimas and Dimitris K. Iakovidis},
title = {Interpretable Similarity of Synthetic Image Utility},
journal = {IEEE Transactions on Medical Imaging},
year = {2026},
publisher = {IEEE},
doi = {10.1109/TMI.2026.3679527}
}
```
## Contact
Prof. Dimitris Iakovidis
Director of Biomedical Imaging Lab
University of Thessaly
diakovidis@uth.gr
## License
This project is licensed under the MIT License.