Add files using upload-large-folder tool

f5f0f0e verified 14 days ago

3.72 kB

	# Inkjet CDM — Final Thesis Results

	> All results use K=100 MC trials (Algorithm 1, difference scoring).
	> Dataset: 1,327 total samples → 20% test split = 266 samples (GOOD=174, BAD=92).
	> Evaluation seed: 42 (same train/test split for all λ values).
	> Last verified: 2026-02-26

	---

	## 1. Overall Comparison — All λ Values

	\| λ \| AUROC ↑ \| Accuracy ↑ \| FPR@95TPR ↓ \| Δ AUROC vs baseline \|
	\|---\|:---:\|:---:\|:---:\|:---:\|
	\| 0.0 (baseline, no sep loss) \| 0.8325 \| 0.7895 \| 0.8161 \| — \|
	\| 0.01 (best AUROC) \| 0.8603 \| 0.8158 \| 0.6264 \| +0.0278 \|
	\| 0.02 \| 0.8541 \| 0.8008 \| 0.6609 \| +0.0216 \|
	\| 0.05 (best FPR) \| 0.8553 \| 0.8233 \| 0.5287 \| +0.0228 \|

	Recommended thesis citation: λ=0.01 as primary result (best AUROC), mention λ=0.05 for best operational FPR.

	---

	## 2. Per-Feature AUROC (K=100)

	\| Feature \| λ=0 \| λ=0.01 \| λ=0.02 \| λ=0.05 \| Best Δ \|
	\|---------\|:---:\|:---:\|:---:\|:---:\|:---:\|
	\| angle \| 0.5556 \| 0.5679 \| 0.6173 \| 0.5556 \| +0.0617 \|
	\| dist1 \| 0.9000 \| 0.8571 \| 0.9429 \| 0.9143 \| — \|
	\| dist6 \| 0.8278 \| 0.8111 \| 0.8389 \| 0.8278 \| — \|
	\| dots \| 0.9126 \| 0.9266 \| 0.9126 \| 0.8881 \| +0.0140 \|
	\| edge1 \| 0.7760 \| 0.8177 \| 0.8594 \| 0.8542 \| +0.0834 \|
	\| edge2 \| 0.7302 \| 0.7242 \| 0.6786 \| 0.7857 \| — \|
	\| edge3 \| 0.7188 \| 0.8750 \| 0.7708 \| 0.8229 \| +0.1562 \|
	\| edge4 \| 0.6667 \| 0.7361 \| 0.6806 \| 0.6597 \| +0.0694 \|
	\| Overall \| 0.8325 \| 0.8603 \| 0.8541 \| 0.8553 \| +0.0278 \|

	> Note: `angle` has only 3 BAD samples in the test set (27 GOOD, 3 BAD) — AUROC is statistically unreliable for this feature. `edge3` shows the largest absolute gain (+15.62pp at λ=0.01).

	---

	## 3. Per-Feature FPR@95TPR (K=100) — lower is better

	\| Feature \| λ=0 \| λ=0.01 \| Δ \|
	\|---------\|:---:\|:---:\|:---:\|
	\| angle \| 0.9630 \| 0.9630 \| 0.000 \|
	\| dist1 \| 0.2381 \| 0.4286 \| +0.190 (regression) \|
	\| dist6 \| 0.9667 \| 0.9333 \| −0.033 \|
	\| dots \| 0.9615 \| 0.8077 \| −0.154 \|
	\| edge1 \| 0.9167 \| 0.9167 \| 0.000 \|
	\| edge2 \| 0.8889 \| 0.7778 \| −0.111 \|
	\| edge3 \| 0.5625 \| 0.4375 \| −0.125 \|
	\| edge4 \| 0.5833 \| 0.4583 \| −0.125 \|
	\| Overall \| 0.8161 \| 0.6264 \| −0.190 \|

	> Overall FPR@95TPR drops by 19pp at λ=0.01 — at 95% defect detection sensitivity, 19% fewer good products are falsely rejected.

	---

	## 4. Per-Template Breakdown (λ=0.01)

	\| Template \| AUROC \| Acc \| FPR@95TPR \| N \|
	\|----------\|:---:\|:---:\|:---:\|:---:\|
	\| A \| 0.7981 \| 0.7048 \| 0.6863 \| 105 \|
	\| B \| 0.8750 \| 0.8214 \| 0.4375 \| 28 \|
	\| C \| 0.8325 \| 0.9023 \| 0.9533 \| 133 \|

	---

	## 5. Stochastic Variance (MC Scoring)

	The OOD score is stochastic (K random timestep samples per image). Across 3 evaluations of the λ=0.02 checkpoint:

	\| Eval run \| K \| AUROC \|
	\|----------\|---\|-------\|
	\| Run 1 \| 50 \| 0.8581 \|
	\| Run 2 \| 50 \| 0.8474 \|
	\| Run 3 \| 100 \| 0.8541 \|

	Variance ≈ ±0.005 at K=50, ±0.003 at K=100. All final results use K=100.

	---

	## 6. Training Configuration

	```
	model: CDM UNet (multi-head conditioning)
	template + feature + quality + bbox heads
	dataset: 1,327 inkjet print samples (174 GOOD, 92 BAD → test set)
	train/test: 80/20 stratified split (seed=42)
	oversampling: BAD samples × 3.0 (minority oversampling during training only)
	image_size: crop-based (YOLO bbox region)
	epochs: 100
	schedule: cosine (Nichol & Dhariwal 2021)
	optimizer: AdamW, lr=1e-4
	sep_loss: L = L_MSE + λ · L_sep
	scoring: Algorithm 1 (difference method), K=100 MC trials
	batch_size: 64 (sep loss runs) / 128 (baseline λ=0)
	GPU: CUDA device 3 (32 GB VRAM)
	```

	# Inkjet CDM — Final Thesis Results

	> All results use K=100 MC trials (Algorithm 1, difference scoring).
	> Dataset: 1,327 total samples → 20% test split = 266 samples (GOOD=174, BAD=92).
	> Evaluation seed: 42 (same train/test split for all λ values).
	> Last verified: 2026-02-26

	---

	## 1. Overall Comparison — All λ Values

	\| λ \| AUROC ↑ \| Accuracy ↑ \| FPR@95TPR ↓ \| Δ AUROC vs baseline \|
	\|---\|:---:\|:---:\|:---:\|:---:\|
	\| 0.0 (baseline, no sep loss) \| 0.8325 \| 0.7895 \| 0.8161 \| — \|
	\| 0.01 (best AUROC) \| 0.8603 \| 0.8158 \| 0.6264 \| +0.0278 \|
	\| 0.02 \| 0.8541 \| 0.8008 \| 0.6609 \| +0.0216 \|
	\| 0.05 (best FPR) \| 0.8553 \| 0.8233 \| 0.5287 \| +0.0228 \|

	Recommended thesis citation: λ=0.01 as primary result (best AUROC), mention λ=0.05 for best operational FPR.

	---

	## 2. Per-Feature AUROC (K=100)

	\| Feature \| λ=0 \| λ=0.01 \| λ=0.02 \| λ=0.05 \| Best Δ \|
	\|---------\|:---:\|:---:\|:---:\|:---:\|:---:\|
	\| angle \| 0.5556 \| 0.5679 \| 0.6173 \| 0.5556 \| +0.0617 \|
	\| dist1 \| 0.9000 \| 0.8571 \| 0.9429 \| 0.9143 \| — \|
	\| dist6 \| 0.8278 \| 0.8111 \| 0.8389 \| 0.8278 \| — \|
	\| dots \| 0.9126 \| 0.9266 \| 0.9126 \| 0.8881 \| +0.0140 \|
	\| edge1 \| 0.7760 \| 0.8177 \| 0.8594 \| 0.8542 \| +0.0834 \|
	\| edge2 \| 0.7302 \| 0.7242 \| 0.6786 \| 0.7857 \| — \|
	\| edge3 \| 0.7188 \| 0.8750 \| 0.7708 \| 0.8229 \| +0.1562 \|
	\| edge4 \| 0.6667 \| 0.7361 \| 0.6806 \| 0.6597 \| +0.0694 \|
	\| Overall \| 0.8325 \| 0.8603 \| 0.8541 \| 0.8553 \| +0.0278 \|

	> Note: `angle` has only 3 BAD samples in the test set (27 GOOD, 3 BAD) — AUROC is statistically unreliable for this feature. `edge3` shows the largest absolute gain (+15.62pp at λ=0.01).

	---

	## 3. Per-Feature FPR@95TPR (K=100) — lower is better

	\| Feature \| λ=0 \| λ=0.01 \| Δ \|
	\|---------\|:---:\|:---:\|:---:\|
	\| angle \| 0.9630 \| 0.9630 \| 0.000 \|
	\| dist1 \| 0.2381 \| 0.4286 \| +0.190 (regression) \|
	\| dist6 \| 0.9667 \| 0.9333 \| −0.033 \|
	\| dots \| 0.9615 \| 0.8077 \| −0.154 \|
	\| edge1 \| 0.9167 \| 0.9167 \| 0.000 \|
	\| edge2 \| 0.8889 \| 0.7778 \| −0.111 \|
	\| edge3 \| 0.5625 \| 0.4375 \| −0.125 \|
	\| edge4 \| 0.5833 \| 0.4583 \| −0.125 \|
	\| Overall \| 0.8161 \| 0.6264 \| −0.190 \|

	> Overall FPR@95TPR drops by 19pp at λ=0.01 — at 95% defect detection sensitivity, 19% fewer good products are falsely rejected.

	---

	## 4. Per-Template Breakdown (λ=0.01)

	\| Template \| AUROC \| Acc \| FPR@95TPR \| N \|
	\|----------\|:---:\|:---:\|:---:\|:---:\|
	\| A \| 0.7981 \| 0.7048 \| 0.6863 \| 105 \|
	\| B \| 0.8750 \| 0.8214 \| 0.4375 \| 28 \|
	\| C \| 0.8325 \| 0.9023 \| 0.9533 \| 133 \|

	---

	## 5. Stochastic Variance (MC Scoring)

	The OOD score is stochastic (K random timestep samples per image). Across 3 evaluations of the λ=0.02 checkpoint:

	\| Eval run \| K \| AUROC \|
	\|----------\|---\|-------\|
	\| Run 1 \| 50 \| 0.8581 \|
	\| Run 2 \| 50 \| 0.8474 \|
	\| Run 3 \| 100 \| 0.8541 \|

	Variance ≈ ±0.005 at K=50, ±0.003 at K=100. All final results use K=100.

	---

	## 6. Training Configuration

	```
	model: CDM UNet (multi-head conditioning)
	template + feature + quality + bbox heads
	dataset: 1,327 inkjet print samples (174 GOOD, 92 BAD → test set)
	train/test: 80/20 stratified split (seed=42)
	oversampling: BAD samples × 3.0 (minority oversampling during training only)
	image_size: crop-based (YOLO bbox region)
	epochs: 100
	schedule: cosine (Nichol & Dhariwal 2021)
	optimizer: AdamW, lr=1e-4
	sep_loss: L = L_MSE + λ · L_sep
	scoring: Algorithm 1 (difference method), K=100 MC trials
	batch_size: 64 (sep loss runs) / 128 (baseline λ=0)
	GPU: CUDA device 3 (32 GB VRAM)
	```