InkjetOOD / docs /RESULTS.md
ahmed-3m's picture
Add files using upload-large-folder tool
f5f0f0e verified

Inkjet CDM — Final Thesis Results

All results use K=100 MC trials (Algorithm 1, difference scoring). Dataset: 1,327 total samples → 20% test split = 266 samples (GOOD=174, BAD=92). Evaluation seed: 42 (same train/test split for all λ values). Last verified: 2026-02-26


1. Overall Comparison — All λ Values

λ AUROC ↑ Accuracy ↑ FPR@95TPR ↓ Δ AUROC vs baseline
0.0 (baseline, no sep loss) 0.8325 0.7895 0.8161
0.01 (best AUROC) 0.8603 0.8158 0.6264 +0.0278
0.02 0.8541 0.8008 0.6609 +0.0216
0.05 (best FPR) 0.8553 0.8233 0.5287 +0.0228

Recommended thesis citation: λ=0.01 as primary result (best AUROC), mention λ=0.05 for best operational FPR.


2. Per-Feature AUROC (K=100)

Feature λ=0 λ=0.01 λ=0.02 λ=0.05 Best Δ
angle 0.5556 0.5679 0.6173 0.5556 +0.0617
dist1 0.9000 0.8571 0.9429 0.9143
dist6 0.8278 0.8111 0.8389 0.8278
dots 0.9126 0.9266 0.9126 0.8881 +0.0140
edge1 0.7760 0.8177 0.8594 0.8542 +0.0834
edge2 0.7302 0.7242 0.6786 0.7857
edge3 0.7188 0.8750 0.7708 0.8229 +0.1562
edge4 0.6667 0.7361 0.6806 0.6597 +0.0694
Overall 0.8325 0.8603 0.8541 0.8553 +0.0278

Note: angle has only 3 BAD samples in the test set (27 GOOD, 3 BAD) — AUROC is statistically unreliable for this feature. edge3 shows the largest absolute gain (+15.62pp at λ=0.01).


3. Per-Feature FPR@95TPR (K=100) — lower is better

Feature λ=0 λ=0.01 Δ
angle 0.9630 0.9630 0.000
dist1 0.2381 0.4286 +0.190 (regression)
dist6 0.9667 0.9333 −0.033
dots 0.9615 0.8077 −0.154
edge1 0.9167 0.9167 0.000
edge2 0.8889 0.7778 −0.111
edge3 0.5625 0.4375 −0.125
edge4 0.5833 0.4583 −0.125
Overall 0.8161 0.6264 −0.190

Overall FPR@95TPR drops by 19pp at λ=0.01 — at 95% defect detection sensitivity, 19% fewer good products are falsely rejected.


4. Per-Template Breakdown (λ=0.01)

Template AUROC Acc FPR@95TPR N
A 0.7981 0.7048 0.6863 105
B 0.8750 0.8214 0.4375 28
C 0.8325 0.9023 0.9533 133

5. Stochastic Variance (MC Scoring)

The OOD score is stochastic (K random timestep samples per image). Across 3 evaluations of the λ=0.02 checkpoint:

Eval run K AUROC
Run 1 50 0.8581
Run 2 50 0.8474
Run 3 100 0.8541

Variance ≈ ±0.005 at K=50, ±0.003 at K=100. All final results use K=100.


6. Training Configuration

model:           CDM UNet (multi-head conditioning)
                 template + feature + quality + bbox heads
dataset:         1,327 inkjet print samples (174 GOOD, 92 BAD → test set)
train/test:      80/20 stratified split (seed=42)
oversampling:    BAD samples × 3.0 (minority oversampling during training only)
image_size:      crop-based (YOLO bbox region)
epochs:          100
schedule:        cosine (Nichol & Dhariwal 2021)
optimizer:       AdamW, lr=1e-4
sep_loss:        L = L_MSE + λ · L_sep
scoring:         Algorithm 1 (difference method), K=100 MC trials
batch_size:      64 (sep loss runs) / 128 (baseline λ=0)
GPU:             CUDA device 3 (32 GB VRAM)