File size: 9,321 Bytes
e0060ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dbe0a04
e0060ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
---
language: en
license: mit
library_name: pytorch
tags:
  - image-classification
  - few-shot-learning
  - prototypical-network
  - dinov2
  - semiconductor
  - defect-detection
  - vision-transformer
  - meta-learning
datasets:
  - custom
pipeline_tag: image-classification
model-index:
  - name: semiconductor-defect-classifier
    results:
      - task:
          type: image-classification
          name: Few-Shot Defect Classification
        metrics:
          - name: Accuracy (K=1)
            type: accuracy
            value: 0.995
          - name: Accuracy (K=5)
            type: accuracy
            value: 0.997
          - name: Accuracy (K=20)
            type: accuracy
            value: 0.998
          - name: Macro F1 (K=20)
            type: f1
            value: 0.999
---

# Semiconductor Defect Classifier

**Few-Shot Semiconductor Wafer Defect Classification using DINOv2 ViT-L/14 + Prototypical Network**

Built for the **ASU Intel Semiconductor Solutions Challenge 2026**. Classifies grayscale semiconductor wafer microscopy images into 9 categories (8 defect types + good) using as few as 1-5 reference images per class.

## Model Description

This model combines a **DINOv2 ViT-L/14** backbone (304M parameters, self-supervised pre-training on 142M images) with a **Prototypical Network** classification head. It was trained using episodic meta-learning on the Intel challenge dataset.

### Architecture

```
Input Image (grayscale, up to 7000x5600)
    |
    v
DINOv2 ViT-L/14 Backbone
  - 304M parameters (last 6 blocks fine-tuned)
  - Gradient checkpointing enabled
  - Output: 1024-dim CLS token
    |
    v
3-Layer Projection Head
  - Linear(1024, 768) + LayerNorm + GELU
  - Linear(768, 768) + LayerNorm + GELU
  - Linear(768, 512) + L2 Normalization
    |
    v
Prototypical Classification
  - Cosine similarity with learned temperature
  - Softmax over class prototypes
  - Good-detection gap threshold (0.20)
```

### Key Design Choices

- **DINOv2 backbone**: Self-supervised features transfer exceptionally well to few-shot tasks, even on out-of-distribution semiconductor images
- **Prototypical Network**: Non-parametric classifier that works with any number of support examples (K=1 to K=20+) without retraining
- **Cosine similarity + learned temperature**: More stable than Euclidean distance for high-dimensional embeddings
- **Differential learning rates**: Backbone fine-tuned at 5e-6, projection head at 3e-4 (60x ratio)
- **Gradient checkpointing**: Reduces VRAM from ~24 GB to ~2 GB with minimal speed penalty

## Training Details

### Dataset

Intel Semiconductor Solutions Challenge 2026 dataset:

| Class | Name | Samples | Description |
|-------|------|---------|-------------|
| 0 | Good | 7,135 | Non-defective wafer surface |
| 1 | Defect 1 | 253 | Scratch-type defect |
| 2 | Defect 2 | 178 | Particle contamination |
| 3 | Defect 3 | 9 | Micro-crack (extremely rare) |
| 4 | Defect 4 | 14 | Edge defect (extremely rare) |
| 5 | Defect 5 | 411 | Pattern anomaly |
| 8 | Defect 8 | 803 | Surface roughness |
| 9 | Defect 9 | 319 | Deposition defect |
| 10 | Defect 10 | 674 | Etch residue |

**Note**: Classes 6 and 7 do not exist in the dataset. The extreme class imbalance (793:1 ratio between good and defect3) and visually similar class pairs (defect3/defect9 at 0.963 cosine similarity, defect4/defect8 at 0.889) make this a challenging benchmark.

### Training Configuration

| Parameter | Value |
|-----------|-------|
| Training paradigm | Episodic meta-learning |
| Episodes per epoch | 500 |
| Episode structure | 9-way 5-shot 10-query |
| Optimizer | AdamW |
| Learning rate (head) | 3.0e-4 |
| Learning rate (backbone) | 5.0e-6 |
| LR schedule | Cosine annealing with 5-epoch warmup |
| Weight decay | 1.0e-4 |
| Label smoothing | 0.1 |
| Gradient clipping | Max norm 1.0 |
| Mixed precision | AMP (float16) |
| Batch processing | Gradient checkpointing |
| Early stopping | Patience 20 epochs |
| Input resolution | 518x518 (DINOv2 native) |
| Preprocessing | LongestMaxSize + PadIfNeeded (aspect-ratio preserving) |

### Training Hardware

- **GPU**: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (95.6 GB VRAM)
- **Actual VRAM usage**: ~2 GB (gradient checkpointing)
- **Training time**: ~17 minutes/epoch
- **Convergence**: 7 epochs (early stopping triggered at epoch 27)

## Performance

### K-Shot Classification Accuracy

| K (support images per class) | Accuracy |
|------------------------------|----------|
| K=1 | 99.5% |
| K=3 | 99.7% |
| K=5 | 99.7% |
| K=10 | 99.7% |
| K=20 | 99.8% |

### Per-Class F1 Scores (K=20)

| Class | F1 Score |
|-------|----------|
| Defect 1 (Scratch) | 1.000 |
| Defect 2 (Particle) | 1.000 |
| Defect 3 (Micro-crack) | 1.000 |
| Defect 4 (Edge) | 1.000 |
| Defect 5 (Pattern) | 0.994 |
| Defect 8 (Roughness) | 1.000 |
| Defect 9 (Deposition) | 1.000 |
| Defect 10 (Etch residue) | 0.996 |

**Balanced accuracy (K=20)**: 0.999
**Macro F1 (K=20)**: 0.999

### Good Image Detection

The model includes a cosine similarity gap threshold for detecting non-defective ("good") wafer images:

| Metric | Value |
|--------|-------|
| Good image accuracy | ~90% |
| Defect image accuracy | ~97% |
| Gap threshold | 0.20 |

## How to Use

### Quick Start

```python
import torch
import yaml
from PIL import Image
from problem_a.src.backbone import get_backbone
from problem_a.src.protonet import PrototypicalNetwork, IncrementalPrototypeTracker
from problem_a.src.augmentations import get_eval_transform

# Load model
with open('problem_a/configs/default.yaml') as f:
    cfg = yaml.safe_load(f)

backbone = get_backbone(cfg['model']['backbone'], cfg['model']['backbone_size'])
model = PrototypicalNetwork(backbone, cfg['model']['proj_hidden'], cfg['model']['proj_dim'])

checkpoint = torch.load('best_model.pt', map_location='cpu', weights_only=False)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval().cuda()

transform = get_eval_transform(cfg['data']['img_size'])

# Create tracker and add support images
tracker = IncrementalPrototypeTracker(model, torch.device('cuda'))

# Add support images (at least 1 per class)
for class_id, image_path in support_images:
    img = Image.open(image_path).convert('L')
    tensor = transform(img)
    tracker.add_example(tensor, class_id)

# Classify a query image
query_img = Image.open('query.png').convert('L')
query_tensor = transform(query_img).unsqueeze(0).cuda()

with torch.no_grad():
    log_probs = model.classify(query_tensor, tracker.prototypes)
    probs = torch.exp(log_probs).squeeze(0)

# Get prediction
label_map = tracker.label_map
reverse_map = {v: k for k, v in label_map.items()}
pred_idx = probs.argmax().item()
predicted_class = reverse_map[pred_idx]
confidence = probs[pred_idx].item()
print(f'Predicted: class {predicted_class}, confidence: {confidence:.3f}')
```

### Download with huggingface_hub

```python
from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="Makatia/semiconductor-defect-classifier",
    filename="best_model.pt"
)
```

## Model Specifications

| Property | Value |
|----------|-------|
| Architecture | DINOv2 ViT-L/14 + Prototypical Network |
| Total parameters | 306,142,209 |
| Trainable parameters | 77,366,273 (25.3%) |
| Backbone | DINOv2 ViT-L/14 (frozen + last 6 blocks) |
| Embedding dimension | 512 (L2-normalized) |
| Projection head | 1024 -> 768 -> 768 -> 512 |
| Input size | 518x518 (aspect-ratio preserved with padding) |
| Input channels | Grayscale (converted to 3-channel internally) |
| Inference time | ~700ms (GPU) / ~3s (CPU) |
| VRAM (inference) | ~2 GB |
| Checkpoint size | 1.17 GB |
| Framework | PyTorch 2.0+ |
| Dependencies | timm >= 1.0, albumentations >= 1.3 |

## Checkpoint Contents

The `.pt` file contains:

```python
{
    'epoch': 7,                    # Best epoch
    'model_state_dict': {...},     # Full model weights
    'best_val_acc': 0.906,         # Validation accuracy (episodic)
    'config': {...},               # Training configuration
}
```

## Intended Use

- **Primary use**: Semiconductor wafer defect detection and classification in manufacturing quality control
- **Few-shot scenarios**: When only 1-20 labeled examples per defect class are available
- **Research**: Few-shot learning, meta-learning, and industrial defect detection benchmarks

## Limitations

- Trained specifically on Intel challenge semiconductor images; may need fine-tuning for other semiconductor processes
- Good image detection (~90% accuracy) is less reliable than defect classification (97-100%)
- Requires grayscale input images; color images should be converted before inference
- Extremely rare classes (defect3: 9 samples, defect4: 14 samples) have lower representation in training

## Source Code

Full training pipeline, evaluation scripts, and PySide6/QML desktop application available at:
[github.com/fidel-makatia/Semiconductor_Defect_Classification_model](https://github.com/fidel-makatia/Semiconductor_Defect_Classification_model)

## Citation

```bibtex
@misc{makatia2026semiconductor,
  title={Few-Shot Semiconductor Defect Classification with DINOv2 and Prototypical Networks},
  author={Fidel Makatia},
  year={2026},
  howpublished={Intel Semiconductor Solutions Challenge 2026},
}
```

## License

MIT License