ivus-segmentation / docs /multitask_finetuning_comprehensive_memo.md

Upload folder using huggingface_hub

1d197a4 verified 3 months ago

7.38 kB

	# IVUS Segmentation and Bifurcation Detection
	## Comprehensive Multi-Task Fine-Tuning Report

	Date: February 20, 2026

	## 1) Purpose and Scope

	This report documents the full methodology used to adapt a pretrained IVUS segmentation model into a multi-task model that performs:

	- Lumen segmentation (pixel-level)
	- Bifurcation detection (frame-level)

	The goal is to provide a self-contained technical description of model design, training behavior, threshold calibration, results, and limitations.

	## 2) Problem Setup

	Given an IVUS frame `x`, we optimize two tasks:

	1. Segmentation output `M_hat`: lumen mask over pixels
	2. Classification output `y_hat`: bifurcation probability in `[0,1]`

	The model is trained at frame level. There is no temporal model (no recurrence, no sequence transformer, no optical flow objective).

	## 3) Data and Labels

	### 3.1 Data organization

	The dataset is built from a frame-bank of manually labeled IVUS frames with train/validation/test partitions.

	Split counts:

	- Train: 420
	- Validation: 90
	- Test: 90

	### 3.2 Label distributions

	Bifurcation positive rate by split:

	- Train: 65.2%
	- Validation: 65.6%
	- Test: 65.6%

	Lumen annotation coverage by split:

	- Train: 47.4%
	- Validation: 51.1%
	- Test: 53.3%

	This means classification supervision is denser than segmentation supervision in the multi-task setting.

	### 3.3 Balance visualizations

	![Split class balance](./memo_assets/split_class_balance_stacked.png)
	![Positive rate by split](./memo_assets/positive_rate_by_split.png)
	![Lumen coverage by split](./memo_assets/lumen_coverage_by_split.png)

	## 4) Model Design

	### 4.1 Backbone + multi-task head

	A pretrained segmentation backbone is reused as initialization.

	A lightweight multi-task classification head is attached on top of segmentation logits:

	- Global average pooling over spatial dimensions
	- Dense layer (ReLU)
	- Dropout
	- Final sigmoid output for bifurcation probability

	This is a multi-task head, not an attention module.

	### 4.2 Task coupling strategy

	The segmentation branch and classification branch share upstream representation. This encourages feature reuse while keeping task-specific outputs separate.

	### 4.3 Conceptual architecture

	![Multi-task training and inference diagram](./memo_assets/multitask_pipeline_diagram.png)

	## 5) Preprocessing and Input Construction

	For each frame:

	1. Apply central black-circle preprocessing (to suppress catheter/artifacts near center).
	2. Convert grayscale to network input representation.
	3. Align labels to frame indices.

	For segmentation labels, only frames with valid lumen polygons are supervised.

	## 6) Loss Functions and Optimization

	Let `i` index samples in a minibatch.

	- `m_i in {0,1}^{H x W}`: ground-truth lumen mask
	- `m_hat_i`: predicted lumen probability map
	- `y_i in {0,1}`: bifurcation label
	- `y_hat_i in (0,1)`: bifurcation probability
	- `h_i in {0,1}`: has-mask indicator (1 if segmentation label exists)

	### 6.1 Segmentation loss

	Weighted BCE + Dice:

	```text
	L_seg,i = L_wbce(m_i, m_hat_i; w_pos) + lambda_dice * L_dice(m_i, m_hat_i)
	```

	Masked batch aggregation (only labeled masks contribute):

	```text
	L_seg = (sum_i h_i * L_seg,i) / (sum_i h_i + eps)
	```

	### 6.2 Classification loss

	Binary cross entropy:

	```text
	L_cls = (1/B) * sum_i L_bce(y_i, y_hat_i)
	```

	### 6.3 Total objective

	```text
	L_total = w_seg * L_seg + w_cls * L_cls
	```

	### 6.4 Optimization behavior

	- GradientTape-style explicit optimization loop for multi-task fine-tuning
	- Gradient clipping by global norm for stability
	- Early stopping using validation objective
	- Best-checkpoint restore before final export

	## 7) Threshold Selection and Operating Point

	After model training, bifurcation threshold `t` is selected on validation data by grid search over candidate thresholds.

	For each `t`:

	```text
	y_hat_i^(t) = 1[y_hat_i >= t]
	```

	Compute precision, recall, F1, accuracy, etc., then choose:

	```text
	t* = argmax_t F1_val(t)
	```

	The selected threshold is persisted and reused during runtime inference.

	## 8) Training Dynamics

	### 8.1 Multi-task fine-tuning dynamics

	![Multi-task training dynamics](./memo_assets/multitask_training_dynamics.png)

	Observed behavior:

	- Validation classification AUC stabilizes high relatively early.
	- Validation F1 is more threshold-sensitive and fluctuates more.
	- Segmentation metrics remain strong but vary with sparse segmentation supervision.

	### 8.2 Lumen-only fine-tuning dynamics

	![Lumen fine-tune dynamics](./memo_assets/lumen_finetune_dynamics.png)

	## 9) Test Performance Summary

	### 9.1 Multi-task test metrics

	Segmentation (subset with lumen labels):

	- IoU: 0.856
	- Dice: 0.923

	Bifurcation classification:

	- Accuracy: 0.900
	- Precision: 0.891
	- Recall: 0.966
	- F1: 0.927
	- AUC: 0.961

	Confusion matrix:

	![Multitask confusion matrix](./memo_assets/multitask_test_confusion_matrix.png)

	Metric snapshot:

	![Multitask metric snapshot](./memo_assets/multitask_test_metric_snapshot.png)

	### 9.2 Segmentation regime comparison

	![Segmentation comparison](./memo_assets/segmentation_regime_comparison.png)

	Note: compared evaluations do not use identical sample sets, so the comparison is directional.

	## 10) Threshold and Calibration Diagnostics

	Standalone classifier diagnostics (supporting analysis):

	![Threshold sweep](./memo_assets/standalone_threshold_sweep.png)
	![Probability histogram](./memo_assets/standalone_probability_hist.png)
	![Reliability diagram](./memo_assets/standalone_reliability_diagram.png)
	![Precision-recall curve with operating point](./memo_assets/precision_recall_curve_with_operating_point.png)

	These plots illustrate threshold sensitivity, score separation, and calibration quality.

	## 11) Limitations

	### 11.1 Split caveat: source overlap

	Train/validation/test share source pullback files (frame-level partitioning rather than source-level partitioning).

	Because the model is frame-independent, this is not temporal leakage. However, repeated source style/statistics across splits can make in-domain metrics optimistic.

	![Split source overlap](./memo_assets/split_source_overlap_heatmap.png)

	### 11.2 Uneven supervision density

	Only about half of samples carry segmentation labels. This creates an imbalance between classification and segmentation supervision in multi-task training.

	### 11.3 Domain shift across source groups

	Performance can vary substantially by source group.

	![Group-wise standalone metrics](./memo_assets/standalone_group_metrics.png)

	This indicates a need for stronger cross-source robustness analysis.

	### 11.4 Head capacity tradeoff

	The current multi-task head is intentionally lightweight. This helps stability and runtime cost, but may under-capture fine spatial context around bifurcation patterns.

	## 12) Practical Conclusions

	1. The current multi-task approach is effective and operationally coherent.
	2. Validation-driven thresholding is critical and should remain part of deployment.
	3. The largest methodological caveat is source-overlap evaluation, not temporal modeling leakage.
	4. Next major quality gain will likely come from stricter source-level split protocols and robustness-focused evaluation.

	## 13) Reproducibility Note

	This report is intended to be self-contained. Supporting figures are stored under `docs/memo_assets/`.

	PDF export command:

	```bash
	scripts/analysis/export_memo_pdf.sh
	```