Instructions to use Aditya2162/ivus-segmentation with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Keras
How to use Aditya2162/ivus-segmentation with Keras:
# Available backend options are: "jax", "torch", "tensorflow". import os os.environ["KERAS_BACKEND"] = "jax" import keras model = keras.saving.load_model("hf://Aditya2162/ivus-segmentation") - Notebooks
- Google Colab
- Kaggle
| # IVUS Segmentation and Bifurcation Detection | |
| ## Comprehensive Multi-Task Fine-Tuning Report | |
| Date: February 20, 2026 | |
| ## 1) Purpose and Scope | |
| This report documents the full methodology used to adapt a pretrained IVUS segmentation model into a multi-task model that performs: | |
| - Lumen segmentation (pixel-level) | |
| - Bifurcation detection (frame-level) | |
| The goal is to provide a self-contained technical description of model design, training behavior, threshold calibration, results, and limitations. | |
| ## 2) Problem Setup | |
| Given an IVUS frame `x`, we optimize two tasks: | |
| 1. Segmentation output `M_hat`: lumen mask over pixels | |
| 2. Classification output `y_hat`: bifurcation probability in `[0,1]` | |
| The model is trained at frame level. There is no temporal model (no recurrence, no sequence transformer, no optical flow objective). | |
| ## 3) Data and Labels | |
| ### 3.1 Data organization | |
| The dataset is built from a frame-bank of manually labeled IVUS frames with train/validation/test partitions. | |
| Split counts: | |
| - Train: 420 | |
| - Validation: 90 | |
| - Test: 90 | |
| ### 3.2 Label distributions | |
| Bifurcation positive rate by split: | |
| - Train: 65.2% | |
| - Validation: 65.6% | |
| - Test: 65.6% | |
| Lumen annotation coverage by split: | |
| - Train: 47.4% | |
| - Validation: 51.1% | |
| - Test: 53.3% | |
| This means classification supervision is denser than segmentation supervision in the multi-task setting. | |
| ### 3.3 Balance visualizations | |
|  | |
|  | |
|  | |
| ## 4) Model Design | |
| ### 4.1 Backbone + multi-task head | |
| A pretrained segmentation backbone is reused as initialization. | |
| A lightweight **multi-task classification head** is attached on top of segmentation logits: | |
| - Global average pooling over spatial dimensions | |
| - Dense layer (ReLU) | |
| - Dropout | |
| - Final sigmoid output for bifurcation probability | |
| This is a multi-task head, not an attention module. | |
| ### 4.2 Task coupling strategy | |
| The segmentation branch and classification branch share upstream representation. This encourages feature reuse while keeping task-specific outputs separate. | |
| ### 4.3 Conceptual architecture | |
|  | |
| ## 5) Preprocessing and Input Construction | |
| For each frame: | |
| 1. Apply central black-circle preprocessing (to suppress catheter/artifacts near center). | |
| 2. Convert grayscale to network input representation. | |
| 3. Align labels to frame indices. | |
| For segmentation labels, only frames with valid lumen polygons are supervised. | |
| ## 6) Loss Functions and Optimization | |
| Let `i` index samples in a minibatch. | |
| - `m_i in {0,1}^{H x W}`: ground-truth lumen mask | |
| - `m_hat_i`: predicted lumen probability map | |
| - `y_i in {0,1}`: bifurcation label | |
| - `y_hat_i in (0,1)`: bifurcation probability | |
| - `h_i in {0,1}`: has-mask indicator (1 if segmentation label exists) | |
| ### 6.1 Segmentation loss | |
| Weighted BCE + Dice: | |
| ```text | |
| L_seg,i = L_wbce(m_i, m_hat_i; w_pos) + lambda_dice * L_dice(m_i, m_hat_i) | |
| ``` | |
| Masked batch aggregation (only labeled masks contribute): | |
| ```text | |
| L_seg = (sum_i h_i * L_seg,i) / (sum_i h_i + eps) | |
| ``` | |
| ### 6.2 Classification loss | |
| Binary cross entropy: | |
| ```text | |
| L_cls = (1/B) * sum_i L_bce(y_i, y_hat_i) | |
| ``` | |
| ### 6.3 Total objective | |
| ```text | |
| L_total = w_seg * L_seg + w_cls * L_cls | |
| ``` | |
| ### 6.4 Optimization behavior | |
| - GradientTape-style explicit optimization loop for multi-task fine-tuning | |
| - Gradient clipping by global norm for stability | |
| - Early stopping using validation objective | |
| - Best-checkpoint restore before final export | |
| ## 7) Threshold Selection and Operating Point | |
| After model training, bifurcation threshold `t` is selected on validation data by grid search over candidate thresholds. | |
| For each `t`: | |
| ```text | |
| y_hat_i^(t) = 1[y_hat_i >= t] | |
| ``` | |
| Compute precision, recall, F1, accuracy, etc., then choose: | |
| ```text | |
| t* = argmax_t F1_val(t) | |
| ``` | |
| The selected threshold is persisted and reused during runtime inference. | |
| ## 8) Training Dynamics | |
| ### 8.1 Multi-task fine-tuning dynamics | |
|  | |
| Observed behavior: | |
| - Validation classification AUC stabilizes high relatively early. | |
| - Validation F1 is more threshold-sensitive and fluctuates more. | |
| - Segmentation metrics remain strong but vary with sparse segmentation supervision. | |
| ### 8.2 Lumen-only fine-tuning dynamics | |
|  | |
| ## 9) Test Performance Summary | |
| ### 9.1 Multi-task test metrics | |
| Segmentation (subset with lumen labels): | |
| - IoU: 0.856 | |
| - Dice: 0.923 | |
| Bifurcation classification: | |
| - Accuracy: 0.900 | |
| - Precision: 0.891 | |
| - Recall: 0.966 | |
| - F1: 0.927 | |
| - AUC: 0.961 | |
| Confusion matrix: | |
|  | |
| Metric snapshot: | |
|  | |
| ### 9.2 Segmentation regime comparison | |
|  | |
| Note: compared evaluations do not use identical sample sets, so the comparison is directional. | |
| ## 10) Threshold and Calibration Diagnostics | |
| Standalone classifier diagnostics (supporting analysis): | |
|  | |
|  | |
|  | |
|  | |
| These plots illustrate threshold sensitivity, score separation, and calibration quality. | |
| ## 11) Limitations | |
| ### 11.1 Split caveat: source overlap | |
| Train/validation/test share source pullback files (frame-level partitioning rather than source-level partitioning). | |
| Because the model is frame-independent, this is not temporal leakage. However, repeated source style/statistics across splits can make in-domain metrics optimistic. | |
|  | |
| ### 11.2 Uneven supervision density | |
| Only about half of samples carry segmentation labels. This creates an imbalance between classification and segmentation supervision in multi-task training. | |
| ### 11.3 Domain shift across source groups | |
| Performance can vary substantially by source group. | |
|  | |
| This indicates a need for stronger cross-source robustness analysis. | |
| ### 11.4 Head capacity tradeoff | |
| The current multi-task head is intentionally lightweight. This helps stability and runtime cost, but may under-capture fine spatial context around bifurcation patterns. | |
| ## 12) Practical Conclusions | |
| 1. The current multi-task approach is effective and operationally coherent. | |
| 2. Validation-driven thresholding is critical and should remain part of deployment. | |
| 3. The largest methodological caveat is source-overlap evaluation, not temporal modeling leakage. | |
| 4. Next major quality gain will likely come from stricter source-level split protocols and robustness-focused evaluation. | |
| ## 13) Reproducibility Note | |
| This report is intended to be self-contained. Supporting figures are stored under `docs/memo_assets/`. | |
| PDF export command: | |
| ```bash | |
| scripts/analysis/export_memo_pdf.sh | |
| ``` | |