ivus-segmentation / docs /multitask_finetuning_comprehensive_memo.html
Aditya2162's picture
Upload folder using huggingface_hub
1d197a4 verified
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>multitask_finetuning_comprehensive_memo</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
</style>
</head>
<body>
<h1 id="ivus-segmentation-and-bifurcation-detection">IVUS Segmentation and Bifurcation Detection</h1>
<h2 id="comprehensive-multi-task-fine-tuning-report">Comprehensive Multi-Task Fine-Tuning Report</h2>
<p>Date: February 20, 2026</p>
<h2 id="1-purpose-and-scope">1) Purpose and Scope</h2>
<p>This report documents the full methodology used to adapt a pretrained IVUS segmentation model into a multi-task model that performs:</p>
<ul>
<li>Lumen segmentation (pixel-level)</li>
<li>Bifurcation detection (frame-level)</li>
</ul>
<p>The goal is to provide a self-contained technical description of model design, training behavior, threshold calibration, results, and limitations.</p>
<h2 id="2-problem-setup">2) Problem Setup</h2>
<p>Given an IVUS frame <code>x</code>, we optimize two tasks:</p>
<ol>
<li>Segmentation output <code>M_hat</code>: lumen mask over pixels</li>
<li>Classification output <code>y_hat</code>: bifurcation probability in <code>[0,1]</code></li>
</ol>
<p>The model is trained at frame level. There is no temporal model (no recurrence, no sequence transformer, no optical flow objective).</p>
<h2 id="3-data-and-labels">3) Data and Labels</h2>
<h3 id="31-data-organization">3.1 Data organization</h3>
<p>The dataset is built from a frame-bank of manually labeled IVUS frames with train/validation/test partitions.</p>
<p>Split counts:</p>
<ul>
<li>Train: 420</li>
<li>Validation: 90</li>
<li>Test: 90</li>
</ul>
<h3 id="32-label-distributions">3.2 Label distributions</h3>
<p>Bifurcation positive rate by split:</p>
<ul>
<li>Train: 65.2%</li>
<li>Validation: 65.6%</li>
<li>Test: 65.6%</li>
</ul>
<p>Lumen annotation coverage by split:</p>
<ul>
<li>Train: 47.4%</li>
<li>Validation: 51.1%</li>
<li>Test: 53.3%</li>
</ul>
<p>This means classification supervision is denser than segmentation supervision in the multi-task setting.</p>
<h3 id="33-balance-visualizations">3.3 Balance visualizations</h3>
<p><img src="./memo_assets/split_class_balance_stacked.png" alt="Split class balance" /> <img src="./memo_assets/positive_rate_by_split.png" alt="Positive rate by split" /> <img src="./memo_assets/lumen_coverage_by_split.png" alt="Lumen coverage by split" /></p>
<h2 id="4-model-design">4) Model Design</h2>
<h3 id="41-backbone--multi-task-head">4.1 Backbone + multi-task head</h3>
<p>A pretrained segmentation backbone is reused as initialization.</p>
<p>A lightweight <strong>multi-task classification head</strong> is attached on top of segmentation logits:</p>
<ul>
<li>Global average pooling over spatial dimensions</li>
<li>Dense layer (ReLU)</li>
<li>Dropout</li>
<li>Final sigmoid output for bifurcation probability</li>
</ul>
<p>This is a multi-task head, not an attention module.</p>
<h3 id="42-task-coupling-strategy">4.2 Task coupling strategy</h3>
<p>The segmentation branch and classification branch share upstream representation. This encourages feature reuse while keeping task-specific outputs separate.</p>
<h3 id="43-conceptual-architecture">4.3 Conceptual architecture</h3>
<p><img src="./memo_assets/multitask_pipeline_diagram.png" alt="Multi-task training and inference diagram" /></p>
<h2 id="5-preprocessing-and-input-construction">5) Preprocessing and Input Construction</h2>
<p>For each frame:</p>
<ol>
<li>Apply central black-circle preprocessing (to suppress catheter/artifacts near center).</li>
<li>Convert grayscale to network input representation.</li>
<li>Align labels to frame indices.</li>
</ol>
<p>For segmentation labels, only frames with valid lumen polygons are supervised.</p>
<h2 id="6-loss-functions-and-optimization">6) Loss Functions and Optimization</h2>
<p>Let <code>i</code> index samples in a minibatch.</p>
<ul>
<li><code>m_i in {0,1}^{H x W}</code>: ground-truth lumen mask</li>
<li><code>m_hat_i</code>: predicted lumen probability map</li>
<li><code>y_i in {0,1}</code>: bifurcation label</li>
<li><code>y_hat_i in (0,1)</code>: bifurcation probability</li>
<li><code>h_i in {0,1}</code>: has-mask indicator (1 if segmentation label exists)</li>
</ul>
<h3 id="61-segmentation-loss">6.1 Segmentation loss</h3>
<p>Weighted BCE + Dice:</p>
<pre class="text"><code>L_seg,i = L_wbce(m_i, m_hat_i; w_pos) + lambda_dice * L_dice(m_i, m_hat_i)
</code></pre>
<p>Masked batch aggregation (only labeled masks contribute):</p>
<pre class="text"><code>L_seg = (sum_i h_i * L_seg,i) / (sum_i h_i + eps)
</code></pre>
<h3 id="62-classification-loss">6.2 Classification loss</h3>
<p>Binary cross entropy:</p>
<pre class="text"><code>L_cls = (1/B) * sum_i L_bce(y_i, y_hat_i)
</code></pre>
<h3 id="63-total-objective">6.3 Total objective</h3>
<pre class="text"><code>L_total = w_seg * L_seg + w_cls * L_cls
</code></pre>
<h3 id="64-optimization-behavior">6.4 Optimization behavior</h3>
<ul>
<li>GradientTape-style explicit optimization loop for multi-task fine-tuning</li>
<li>Gradient clipping by global norm for stability</li>
<li>Early stopping using validation objective</li>
<li>Best-checkpoint restore before final export</li>
</ul>
<h2 id="7-threshold-selection-and-operating-point">7) Threshold Selection and Operating Point</h2>
<p>After model training, bifurcation threshold <code>t</code> is selected on validation data by grid search over candidate thresholds.</p>
<p>For each <code>t</code>:</p>
<pre class="text"><code>y_hat_i^(t) = 1[y_hat_i &gt;= t]
</code></pre>
<p>Compute precision, recall, F1, accuracy, etc., then choose:</p>
<pre class="text"><code>t* = argmax_t F1_val(t)
</code></pre>
<p>The selected threshold is persisted and reused during runtime inference.</p>
<h2 id="8-training-dynamics">8) Training Dynamics</h2>
<h3 id="81-multi-task-fine-tuning-dynamics">8.1 Multi-task fine-tuning dynamics</h3>
<p><img src="./memo_assets/multitask_training_dynamics.png" alt="Multi-task training dynamics" /></p>
<p>Observed behavior:</p>
<ul>
<li>Validation classification AUC stabilizes high relatively early.</li>
<li>Validation F1 is more threshold-sensitive and fluctuates more.</li>
<li>Segmentation metrics remain strong but vary with sparse segmentation supervision.</li>
</ul>
<h3 id="82-lumen-only-fine-tuning-dynamics">8.2 Lumen-only fine-tuning dynamics</h3>
<p><img src="./memo_assets/lumen_finetune_dynamics.png" alt="Lumen fine-tune dynamics" /></p>
<h2 id="9-test-performance-summary">9) Test Performance Summary</h2>
<h3 id="91-multi-task-test-metrics">9.1 Multi-task test metrics</h3>
<p>Segmentation (subset with lumen labels):</p>
<ul>
<li>IoU: 0.856</li>
<li>Dice: 0.923</li>
</ul>
<p>Bifurcation classification:</p>
<ul>
<li>Accuracy: 0.900</li>
<li>Precision: 0.891</li>
<li>Recall: 0.966</li>
<li>F1: 0.927</li>
<li>AUC: 0.961</li>
</ul>
<p>Confusion matrix:</p>
<p><img src="./memo_assets/multitask_test_confusion_matrix.png" alt="Multitask confusion matrix" /></p>
<p>Metric snapshot:</p>
<p><img src="./memo_assets/multitask_test_metric_snapshot.png" alt="Multitask metric snapshot" /></p>
<h3 id="92-segmentation-regime-comparison">9.2 Segmentation regime comparison</h3>
<p><img src="./memo_assets/segmentation_regime_comparison.png" alt="Segmentation comparison" /></p>
<p>Note: compared evaluations do not use identical sample sets, so the comparison is directional.</p>
<h2 id="10-threshold-and-calibration-diagnostics">10) Threshold and Calibration Diagnostics</h2>
<p>Standalone classifier diagnostics (supporting analysis):</p>
<p><img src="./memo_assets/standalone_threshold_sweep.png" alt="Threshold sweep" /> <img src="./memo_assets/standalone_probability_hist.png" alt="Probability histogram" /> <img src="./memo_assets/standalone_reliability_diagram.png" alt="Reliability diagram" /> <img src="./memo_assets/precision_recall_curve_with_operating_point.png" alt="Precision-recall curve with operating point" /></p>
<p>These plots illustrate threshold sensitivity, score separation, and calibration quality.</p>
<h2 id="11-limitations">11) Limitations</h2>
<h3 id="111-split-caveat-source-overlap">11.1 Split caveat: source overlap</h3>
<p>Train/validation/test share source pullback files (frame-level partitioning rather than source-level partitioning).</p>
<p>Because the model is frame-independent, this is not temporal leakage. However, repeated source style/statistics across splits can make in-domain metrics optimistic.</p>
<p><img src="./memo_assets/split_source_overlap_heatmap.png" alt="Split source overlap" /></p>
<h3 id="112-uneven-supervision-density">11.2 Uneven supervision density</h3>
<p>Only about half of samples carry segmentation labels. This creates an imbalance between classification and segmentation supervision in multi-task training.</p>
<h3 id="113-domain-shift-across-source-groups">11.3 Domain shift across source groups</h3>
<p>Performance can vary substantially by source group.</p>
<p><img src="./memo_assets/standalone_group_metrics.png" alt="Group-wise standalone metrics" /></p>
<p>This indicates a need for stronger cross-source robustness analysis.</p>
<h3 id="114-head-capacity-tradeoff">11.4 Head capacity tradeoff</h3>
<p>The current multi-task head is intentionally lightweight. This helps stability and runtime cost, but may under-capture fine spatial context around bifurcation patterns.</p>
<h2 id="12-practical-conclusions">12) Practical Conclusions</h2>
<ol>
<li>The current multi-task approach is effective and operationally coherent.</li>
<li>Validation-driven thresholding is critical and should remain part of deployment.</li>
<li>The largest methodological caveat is source-overlap evaluation, not temporal modeling leakage.</li>
<li>Next major quality gain will likely come from stricter source-level split protocols and robustness-focused evaluation.</li>
</ol>
<h2 id="13-reproducibility-note">13) Reproducibility Note</h2>
<p>This report is intended to be self-contained. Supporting figures are stored under <code>docs/memo_assets/</code>.</p>
<p>PDF export command:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true"></a><span class="ex">scripts/analysis/export_memo_pdf.sh</span></span></code></pre></div>
</body>
</html>