# IVUS Segmentation and Bifurcation Detection
## Comprehensive Multi-Task Fine-Tuning Report

Date: February 20, 2026

## 1) Purpose and Scope

This report documents the full methodology used to adapt a pretrained IVUS segmentation model into a multi-task model that performs:

- Lumen segmentation (pixel-level)
- Bifurcation detection (frame-level)

The goal is to provide a self-contained technical description of model design, training behavior, threshold calibration, results, and limitations.

## 2) Problem Setup

Given an IVUS frame `x`, we optimize two tasks:

1. Segmentation output `M_hat`: lumen mask over pixels
2. Classification output `y_hat`: bifurcation probability in `[0,1]`

The model is trained at frame level. There is no temporal model (no recurrence, no sequence transformer, no optical flow objective).

## 3) Data and Labels

### 3.1 Data organization

The dataset is built from a frame-bank of manually labeled IVUS frames with train/validation/test partitions.

Split counts:

- Train: 420
- Validation: 90
- Test: 90

### 3.2 Label distributions

Bifurcation positive rate by split:

- Train: 65.2%
- Validation: 65.6%
- Test: 65.6%

Lumen annotation coverage by split:

- Train: 47.4%
- Validation: 51.1%
- Test: 53.3%

This means classification supervision is denser than segmentation supervision in the multi-task setting.

### 3.3 Balance visualizations

![Split class balance](./memo_assets/split_class_balance_stacked.png)
![Positive rate by split](./memo_assets/positive_rate_by_split.png)
![Lumen coverage by split](./memo_assets/lumen_coverage_by_split.png)

## 4) Model Design

### 4.1 Backbone + multi-task head

A pretrained segmentation backbone is reused as initialization.

A lightweight **multi-task classification head** is attached on top of segmentation logits:

- Global average pooling over spatial dimensions
- Dense layer (ReLU)
- Dropout
- Final sigmoid output for bifurcation probability

This is a multi-task head, not an attention module.

### 4.2 Task coupling strategy

The segmentation branch and classification branch share upstream representation. This encourages feature reuse while keeping task-specific outputs separate.

### 4.3 Conceptual architecture

![Multi-task training and inference diagram](./memo_assets/multitask_pipeline_diagram.png)

## 5) Preprocessing and Input Construction

For each frame:

1. Apply central black-circle preprocessing (to suppress catheter/artifacts near center).
2. Convert grayscale to network input representation.
3. Align labels to frame indices.

For segmentation labels, only frames with valid lumen polygons are supervised.

## 6) Loss Functions and Optimization

Let `i` index samples in a minibatch.

- `m_i in {0,1}^{H x W}`: ground-truth lumen mask
- `m_hat_i`: predicted lumen probability map
- `y_i in {0,1}`: bifurcation label
- `y_hat_i in (0,1)`: bifurcation probability
- `h_i in {0,1}`: has-mask indicator (1 if segmentation label exists)

### 6.1 Segmentation loss

Weighted BCE + Dice:

```text
L_seg,i = L_wbce(m_i, m_hat_i; w_pos) + lambda_dice * L_dice(m_i, m_hat_i)
```

Masked batch aggregation (only labeled masks contribute):

```text
L_seg = (sum_i h_i * L_seg,i) / (sum_i h_i + eps)
```

### 6.2 Classification loss

Binary cross entropy:

```text
L_cls = (1/B) * sum_i L_bce(y_i, y_hat_i)
```

### 6.3 Total objective

```text
L_total = w_seg * L_seg + w_cls * L_cls
```

### 6.4 Optimization behavior

- GradientTape-style explicit optimization loop for multi-task fine-tuning
- Gradient clipping by global norm for stability
- Early stopping using validation objective
- Best-checkpoint restore before final export

## 7) Threshold Selection and Operating Point

After model training, bifurcation threshold `t` is selected on validation data by grid search over candidate thresholds.

For each `t`:

```text
y_hat_i^(t) = 1[y_hat_i >= t]
```

Compute precision, recall, F1, accuracy, etc., then choose:

```text
t* = argmax_t F1_val(t)
```

The selected threshold is persisted and reused during runtime inference.

## 8) Training Dynamics

### 8.1 Multi-task fine-tuning dynamics

![Multi-task training dynamics](./memo_assets/multitask_training_dynamics.png)

Observed behavior:

- Validation classification AUC stabilizes high relatively early.
- Validation F1 is more threshold-sensitive and fluctuates more.
- Segmentation metrics remain strong but vary with sparse segmentation supervision.

### 8.2 Lumen-only fine-tuning dynamics

![Lumen fine-tune dynamics](./memo_assets/lumen_finetune_dynamics.png)

## 9) Test Performance Summary

### 9.1 Multi-task test metrics

Segmentation (subset with lumen labels):

- IoU: 0.856
- Dice: 0.923

Bifurcation classification:

- Accuracy: 0.900
- Precision: 0.891
- Recall: 0.966
- F1: 0.927
- AUC: 0.961

Confusion matrix:

![Multitask confusion matrix](./memo_assets/multitask_test_confusion_matrix.png)

Metric snapshot:

![Multitask metric snapshot](./memo_assets/multitask_test_metric_snapshot.png)

### 9.2 Segmentation regime comparison

![Segmentation comparison](./memo_assets/segmentation_regime_comparison.png)

Note: compared evaluations do not use identical sample sets, so the comparison is directional.

## 10) Threshold and Calibration Diagnostics

Standalone classifier diagnostics (supporting analysis):

![Threshold sweep](./memo_assets/standalone_threshold_sweep.png)
![Probability histogram](./memo_assets/standalone_probability_hist.png)
![Reliability diagram](./memo_assets/standalone_reliability_diagram.png)
![Precision-recall curve with operating point](./memo_assets/precision_recall_curve_with_operating_point.png)

These plots illustrate threshold sensitivity, score separation, and calibration quality.

## 11) Limitations

### 11.1 Split caveat: source overlap

Train/validation/test share source pullback files (frame-level partitioning rather than source-level partitioning).

Because the model is frame-independent, this is not temporal leakage. However, repeated source style/statistics across splits can make in-domain metrics optimistic.

![Split source overlap](./memo_assets/split_source_overlap_heatmap.png)

### 11.2 Uneven supervision density

Only about half of samples carry segmentation labels. This creates an imbalance between classification and segmentation supervision in multi-task training.

### 11.3 Domain shift across source groups

Performance can vary substantially by source group.

![Group-wise standalone metrics](./memo_assets/standalone_group_metrics.png)

This indicates a need for stronger cross-source robustness analysis.

### 11.4 Head capacity tradeoff

The current multi-task head is intentionally lightweight. This helps stability and runtime cost, but may under-capture fine spatial context around bifurcation patterns.

## 12) Practical Conclusions

1. The current multi-task approach is effective and operationally coherent.
2. Validation-driven thresholding is critical and should remain part of deployment.
3. The largest methodological caveat is source-overlap evaluation, not temporal modeling leakage.
4. Next major quality gain will likely come from stricter source-level split protocols and robustness-focused evaluation.

## 13) Reproducibility Note

This report is intended to be self-contained. Supporting figures are stored under `docs/memo_assets/`.

PDF export command:

```bash
scripts/analysis/export_memo_pdf.sh
```