Makatia commited on
Commit
e0060ae
·
verified ·
1 Parent(s): 091c712

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +289 -0
README.md ADDED
@@ -0,0 +1,289 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ library_name: pytorch
5
+ tags:
6
+ - image-classification
7
+ - few-shot-learning
8
+ - prototypical-network
9
+ - dinov2
10
+ - semiconductor
11
+ - defect-detection
12
+ - vision-transformer
13
+ - meta-learning
14
+ datasets:
15
+ - custom
16
+ pipeline_tag: image-classification
17
+ model-index:
18
+ - name: semiconductor-defect-classifier
19
+ results:
20
+ - task:
21
+ type: image-classification
22
+ name: Few-Shot Defect Classification
23
+ metrics:
24
+ - name: Accuracy (K=1)
25
+ type: accuracy
26
+ value: 0.995
27
+ - name: Accuracy (K=5)
28
+ type: accuracy
29
+ value: 0.997
30
+ - name: Accuracy (K=20)
31
+ type: accuracy
32
+ value: 0.998
33
+ - name: Macro F1 (K=20)
34
+ type: f1
35
+ value: 0.999
36
+ ---
37
+
38
+ # Semiconductor Defect Classifier
39
+
40
+ **Few-Shot Semiconductor Wafer Defect Classification using DINOv2 ViT-L/14 + Prototypical Network**
41
+
42
+ Built for the **Intel Semiconductor Solutions Challenge 2026**. Classifies grayscale semiconductor wafer microscopy images into 9 categories (8 defect types + good) using as few as 1-5 reference images per class.
43
+
44
+ ## Model Description
45
+
46
+ This model combines a **DINOv2 ViT-L/14** backbone (304M parameters, self-supervised pre-training on 142M images) with a **Prototypical Network** classification head. It was trained using episodic meta-learning on the Intel challenge dataset.
47
+
48
+ ### Architecture
49
+
50
+ ```
51
+ Input Image (grayscale, up to 7000x5600)
52
+ |
53
+ v
54
+ DINOv2 ViT-L/14 Backbone
55
+ - 304M parameters (last 6 blocks fine-tuned)
56
+ - Gradient checkpointing enabled
57
+ - Output: 1024-dim CLS token
58
+ |
59
+ v
60
+ 3-Layer Projection Head
61
+ - Linear(1024, 768) + LayerNorm + GELU
62
+ - Linear(768, 768) + LayerNorm + GELU
63
+ - Linear(768, 512) + L2 Normalization
64
+ |
65
+ v
66
+ Prototypical Classification
67
+ - Cosine similarity with learned temperature
68
+ - Softmax over class prototypes
69
+ - Good-detection gap threshold (0.20)
70
+ ```
71
+
72
+ ### Key Design Choices
73
+
74
+ - **DINOv2 backbone**: Self-supervised features transfer exceptionally well to few-shot tasks, even on out-of-distribution semiconductor images
75
+ - **Prototypical Network**: Non-parametric classifier that works with any number of support examples (K=1 to K=20+) without retraining
76
+ - **Cosine similarity + learned temperature**: More stable than Euclidean distance for high-dimensional embeddings
77
+ - **Differential learning rates**: Backbone fine-tuned at 5e-6, projection head at 3e-4 (60x ratio)
78
+ - **Gradient checkpointing**: Reduces VRAM from ~24 GB to ~2 GB with minimal speed penalty
79
+
80
+ ## Training Details
81
+
82
+ ### Dataset
83
+
84
+ Intel Semiconductor Solutions Challenge 2026 dataset:
85
+
86
+ | Class | Name | Samples | Description |
87
+ |-------|------|---------|-------------|
88
+ | 0 | Good | 7,135 | Non-defective wafer surface |
89
+ | 1 | Defect 1 | 253 | Scratch-type defect |
90
+ | 2 | Defect 2 | 178 | Particle contamination |
91
+ | 3 | Defect 3 | 9 | Micro-crack (extremely rare) |
92
+ | 4 | Defect 4 | 14 | Edge defect (extremely rare) |
93
+ | 5 | Defect 5 | 411 | Pattern anomaly |
94
+ | 8 | Defect 8 | 803 | Surface roughness |
95
+ | 9 | Defect 9 | 319 | Deposition defect |
96
+ | 10 | Defect 10 | 674 | Etch residue |
97
+
98
+ **Note**: Classes 6 and 7 do not exist in the dataset. The extreme class imbalance (793:1 ratio between good and defect3) and visually similar class pairs (defect3/defect9 at 0.963 cosine similarity, defect4/defect8 at 0.889) make this a challenging benchmark.
99
+
100
+ ### Training Configuration
101
+
102
+ | Parameter | Value |
103
+ |-----------|-------|
104
+ | Training paradigm | Episodic meta-learning |
105
+ | Episodes per epoch | 500 |
106
+ | Episode structure | 9-way 5-shot 10-query |
107
+ | Optimizer | AdamW |
108
+ | Learning rate (head) | 3.0e-4 |
109
+ | Learning rate (backbone) | 5.0e-6 |
110
+ | LR schedule | Cosine annealing with 5-epoch warmup |
111
+ | Weight decay | 1.0e-4 |
112
+ | Label smoothing | 0.1 |
113
+ | Gradient clipping | Max norm 1.0 |
114
+ | Mixed precision | AMP (float16) |
115
+ | Batch processing | Gradient checkpointing |
116
+ | Early stopping | Patience 20 epochs |
117
+ | Input resolution | 518x518 (DINOv2 native) |
118
+ | Preprocessing | LongestMaxSize + PadIfNeeded (aspect-ratio preserving) |
119
+
120
+ ### Training Hardware
121
+
122
+ - **GPU**: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (95.6 GB VRAM)
123
+ - **Actual VRAM usage**: ~2 GB (gradient checkpointing)
124
+ - **Training time**: ~17 minutes/epoch
125
+ - **Convergence**: 7 epochs (early stopping triggered at epoch 27)
126
+
127
+ ## Performance
128
+
129
+ ### K-Shot Classification Accuracy
130
+
131
+ | K (support images per class) | Accuracy |
132
+ |------------------------------|----------|
133
+ | K=1 | 99.5% |
134
+ | K=3 | 99.7% |
135
+ | K=5 | 99.7% |
136
+ | K=10 | 99.7% |
137
+ | K=20 | 99.8% |
138
+
139
+ ### Per-Class F1 Scores (K=20)
140
+
141
+ | Class | F1 Score |
142
+ |-------|----------|
143
+ | Defect 1 (Scratch) | 1.000 |
144
+ | Defect 2 (Particle) | 1.000 |
145
+ | Defect 3 (Micro-crack) | 1.000 |
146
+ | Defect 4 (Edge) | 1.000 |
147
+ | Defect 5 (Pattern) | 0.994 |
148
+ | Defect 8 (Roughness) | 1.000 |
149
+ | Defect 9 (Deposition) | 1.000 |
150
+ | Defect 10 (Etch residue) | 0.996 |
151
+
152
+ **Balanced accuracy (K=20)**: 0.999
153
+ **Macro F1 (K=20)**: 0.999
154
+
155
+ ### Good Image Detection
156
+
157
+ The model includes a cosine similarity gap threshold for detecting non-defective ("good") wafer images:
158
+
159
+ | Metric | Value |
160
+ |--------|-------|
161
+ | Good image accuracy | ~90% |
162
+ | Defect image accuracy | ~97% |
163
+ | Gap threshold | 0.20 |
164
+
165
+ ## How to Use
166
+
167
+ ### Quick Start
168
+
169
+ ```python
170
+ import torch
171
+ import yaml
172
+ from PIL import Image
173
+ from problem_a.src.backbone import get_backbone
174
+ from problem_a.src.protonet import PrototypicalNetwork, IncrementalPrototypeTracker
175
+ from problem_a.src.augmentations import get_eval_transform
176
+
177
+ # Load model
178
+ with open('problem_a/configs/default.yaml') as f:
179
+ cfg = yaml.safe_load(f)
180
+
181
+ backbone = get_backbone(cfg['model']['backbone'], cfg['model']['backbone_size'])
182
+ model = PrototypicalNetwork(backbone, cfg['model']['proj_hidden'], cfg['model']['proj_dim'])
183
+
184
+ checkpoint = torch.load('best_model.pt', map_location='cpu', weights_only=False)
185
+ model.load_state_dict(checkpoint['model_state_dict'])
186
+ model.eval().cuda()
187
+
188
+ transform = get_eval_transform(cfg['data']['img_size'])
189
+
190
+ # Create tracker and add support images
191
+ tracker = IncrementalPrototypeTracker(model, torch.device('cuda'))
192
+
193
+ # Add support images (at least 1 per class)
194
+ for class_id, image_path in support_images:
195
+ img = Image.open(image_path).convert('L')
196
+ tensor = transform(img)
197
+ tracker.add_example(tensor, class_id)
198
+
199
+ # Classify a query image
200
+ query_img = Image.open('query.png').convert('L')
201
+ query_tensor = transform(query_img).unsqueeze(0).cuda()
202
+
203
+ with torch.no_grad():
204
+ log_probs = model.classify(query_tensor, tracker.prototypes)
205
+ probs = torch.exp(log_probs).squeeze(0)
206
+
207
+ # Get prediction
208
+ label_map = tracker.label_map
209
+ reverse_map = {v: k for k, v in label_map.items()}
210
+ pred_idx = probs.argmax().item()
211
+ predicted_class = reverse_map[pred_idx]
212
+ confidence = probs[pred_idx].item()
213
+ print(f'Predicted: class {predicted_class}, confidence: {confidence:.3f}')
214
+ ```
215
+
216
+ ### Download with huggingface_hub
217
+
218
+ ```python
219
+ from huggingface_hub import hf_hub_download
220
+
221
+ checkpoint_path = hf_hub_download(
222
+ repo_id="Makatia/semiconductor-defect-classifier",
223
+ filename="best_model.pt"
224
+ )
225
+ ```
226
+
227
+ ## Model Specifications
228
+
229
+ | Property | Value |
230
+ |----------|-------|
231
+ | Architecture | DINOv2 ViT-L/14 + Prototypical Network |
232
+ | Total parameters | 306,142,209 |
233
+ | Trainable parameters | 77,366,273 (25.3%) |
234
+ | Backbone | DINOv2 ViT-L/14 (frozen + last 6 blocks) |
235
+ | Embedding dimension | 512 (L2-normalized) |
236
+ | Projection head | 1024 -> 768 -> 768 -> 512 |
237
+ | Input size | 518x518 (aspect-ratio preserved with padding) |
238
+ | Input channels | Grayscale (converted to 3-channel internally) |
239
+ | Inference time | ~700ms (GPU) / ~3s (CPU) |
240
+ | VRAM (inference) | ~2 GB |
241
+ | Checkpoint size | 1.17 GB |
242
+ | Framework | PyTorch 2.0+ |
243
+ | Dependencies | timm >= 1.0, albumentations >= 1.3 |
244
+
245
+ ## Checkpoint Contents
246
+
247
+ The `.pt` file contains:
248
+
249
+ ```python
250
+ {
251
+ 'epoch': 7, # Best epoch
252
+ 'model_state_dict': {...}, # Full model weights
253
+ 'best_val_acc': 0.906, # Validation accuracy (episodic)
254
+ 'config': {...}, # Training configuration
255
+ }
256
+ ```
257
+
258
+ ## Intended Use
259
+
260
+ - **Primary use**: Semiconductor wafer defect detection and classification in manufacturing quality control
261
+ - **Few-shot scenarios**: When only 1-20 labeled examples per defect class are available
262
+ - **Research**: Few-shot learning, meta-learning, and industrial defect detection benchmarks
263
+
264
+ ## Limitations
265
+
266
+ - Trained specifically on Intel challenge semiconductor images; may need fine-tuning for other semiconductor processes
267
+ - Good image detection (~90% accuracy) is less reliable than defect classification (97-100%)
268
+ - Requires grayscale input images; color images should be converted before inference
269
+ - Extremely rare classes (defect3: 9 samples, defect4: 14 samples) have lower representation in training
270
+
271
+ ## Source Code
272
+
273
+ Full training pipeline, evaluation scripts, and PySide6/QML desktop application available at:
274
+ [github.com/fidel-makatia/Semiconductor_Defect_Classification_model](https://github.com/fidel-makatia/Semiconductor_Defect_Classification_model)
275
+
276
+ ## Citation
277
+
278
+ ```bibtex
279
+ @misc{makatia2026semiconductor,
280
+ title={Few-Shot Semiconductor Defect Classification with DINOv2 and Prototypical Networks},
281
+ author={Fidel Makatia},
282
+ year={2026},
283
+ howpublished={Intel Semiconductor Solutions Challenge 2026},
284
+ }
285
+ ```
286
+
287
+ ## License
288
+
289
+ MIT License