# IVUS Segmentation and Bifurcation Detection ## Comprehensive Multi-Task Fine-Tuning Report Date: February 20, 2026 ## 1) Purpose and Scope This report documents the full methodology used to adapt a pretrained IVUS segmentation model into a multi-task model that performs: - Lumen segmentation (pixel-level) - Bifurcation detection (frame-level) The goal is to provide a self-contained technical description of model design, training behavior, threshold calibration, results, and limitations. ## 2) Problem Setup Given an IVUS frame `x`, we optimize two tasks: 1. Segmentation output `M_hat`: lumen mask over pixels 2. Classification output `y_hat`: bifurcation probability in `[0,1]` The model is trained at frame level. There is no temporal model (no recurrence, no sequence transformer, no optical flow objective). ## 3) Data and Labels ### 3.1 Data organization The dataset is built from a frame-bank of manually labeled IVUS frames with train/validation/test partitions. Split counts: - Train: 420 - Validation: 90 - Test: 90 ### 3.2 Label distributions Bifurcation positive rate by split: - Train: 65.2% - Validation: 65.6% - Test: 65.6% Lumen annotation coverage by split: - Train: 47.4% - Validation: 51.1% - Test: 53.3% This means classification supervision is denser than segmentation supervision in the multi-task setting. ### 3.3 Balance visualizations ![Split class balance](./memo_assets/split_class_balance_stacked.png) ![Positive rate by split](./memo_assets/positive_rate_by_split.png) ![Lumen coverage by split](./memo_assets/lumen_coverage_by_split.png) ## 4) Model Design ### 4.1 Backbone + multi-task head A pretrained segmentation backbone is reused as initialization. A lightweight **multi-task classification head** is attached on top of segmentation logits: - Global average pooling over spatial dimensions - Dense layer (ReLU) - Dropout - Final sigmoid output for bifurcation probability This is a multi-task head, not an attention module. ### 4.2 Task coupling strategy The segmentation branch and classification branch share upstream representation. This encourages feature reuse while keeping task-specific outputs separate. ### 4.3 Conceptual architecture ![Multi-task training and inference diagram](./memo_assets/multitask_pipeline_diagram.png) ## 5) Preprocessing and Input Construction For each frame: 1. Apply central black-circle preprocessing (to suppress catheter/artifacts near center). 2. Convert grayscale to network input representation. 3. Align labels to frame indices. For segmentation labels, only frames with valid lumen polygons are supervised. ## 6) Loss Functions and Optimization Let `i` index samples in a minibatch. - `m_i in {0,1}^{H x W}`: ground-truth lumen mask - `m_hat_i`: predicted lumen probability map - `y_i in {0,1}`: bifurcation label - `y_hat_i in (0,1)`: bifurcation probability - `h_i in {0,1}`: has-mask indicator (1 if segmentation label exists) ### 6.1 Segmentation loss Weighted BCE + Dice: ```text L_seg,i = L_wbce(m_i, m_hat_i; w_pos) + lambda_dice * L_dice(m_i, m_hat_i) ``` Masked batch aggregation (only labeled masks contribute): ```text L_seg = (sum_i h_i * L_seg,i) / (sum_i h_i + eps) ``` ### 6.2 Classification loss Binary cross entropy: ```text L_cls = (1/B) * sum_i L_bce(y_i, y_hat_i) ``` ### 6.3 Total objective ```text L_total = w_seg * L_seg + w_cls * L_cls ``` ### 6.4 Optimization behavior - GradientTape-style explicit optimization loop for multi-task fine-tuning - Gradient clipping by global norm for stability - Early stopping using validation objective - Best-checkpoint restore before final export ## 7) Threshold Selection and Operating Point After model training, bifurcation threshold `t` is selected on validation data by grid search over candidate thresholds. For each `t`: ```text y_hat_i^(t) = 1[y_hat_i >= t] ``` Compute precision, recall, F1, accuracy, etc., then choose: ```text t* = argmax_t F1_val(t) ``` The selected threshold is persisted and reused during runtime inference. ## 8) Training Dynamics ### 8.1 Multi-task fine-tuning dynamics ![Multi-task training dynamics](./memo_assets/multitask_training_dynamics.png) Observed behavior: - Validation classification AUC stabilizes high relatively early. - Validation F1 is more threshold-sensitive and fluctuates more. - Segmentation metrics remain strong but vary with sparse segmentation supervision. ### 8.2 Lumen-only fine-tuning dynamics ![Lumen fine-tune dynamics](./memo_assets/lumen_finetune_dynamics.png) ## 9) Test Performance Summary ### 9.1 Multi-task test metrics Segmentation (subset with lumen labels): - IoU: 0.856 - Dice: 0.923 Bifurcation classification: - Accuracy: 0.900 - Precision: 0.891 - Recall: 0.966 - F1: 0.927 - AUC: 0.961 Confusion matrix: ![Multitask confusion matrix](./memo_assets/multitask_test_confusion_matrix.png) Metric snapshot: ![Multitask metric snapshot](./memo_assets/multitask_test_metric_snapshot.png) ### 9.2 Segmentation regime comparison ![Segmentation comparison](./memo_assets/segmentation_regime_comparison.png) Note: compared evaluations do not use identical sample sets, so the comparison is directional. ## 10) Threshold and Calibration Diagnostics Standalone classifier diagnostics (supporting analysis): ![Threshold sweep](./memo_assets/standalone_threshold_sweep.png) ![Probability histogram](./memo_assets/standalone_probability_hist.png) ![Reliability diagram](./memo_assets/standalone_reliability_diagram.png) ![Precision-recall curve with operating point](./memo_assets/precision_recall_curve_with_operating_point.png) These plots illustrate threshold sensitivity, score separation, and calibration quality. ## 11) Limitations ### 11.1 Split caveat: source overlap Train/validation/test share source pullback files (frame-level partitioning rather than source-level partitioning). Because the model is frame-independent, this is not temporal leakage. However, repeated source style/statistics across splits can make in-domain metrics optimistic. ![Split source overlap](./memo_assets/split_source_overlap_heatmap.png) ### 11.2 Uneven supervision density Only about half of samples carry segmentation labels. This creates an imbalance between classification and segmentation supervision in multi-task training. ### 11.3 Domain shift across source groups Performance can vary substantially by source group. ![Group-wise standalone metrics](./memo_assets/standalone_group_metrics.png) This indicates a need for stronger cross-source robustness analysis. ### 11.4 Head capacity tradeoff The current multi-task head is intentionally lightweight. This helps stability and runtime cost, but may under-capture fine spatial context around bifurcation patterns. ## 12) Practical Conclusions 1. The current multi-task approach is effective and operationally coherent. 2. Validation-driven thresholding is critical and should remain part of deployment. 3. The largest methodological caveat is source-overlap evaluation, not temporal modeling leakage. 4. Next major quality gain will likely come from stricter source-level split protocols and robustness-focused evaluation. ## 13) Reproducibility Note This report is intended to be self-contained. Supporting figures are stored under `docs/memo_assets/`. PDF export command: ```bash scripts/analysis/export_memo_pdf.sh ```