Makatia
/

semiconductor-defect-classifier

+---
+language: en
+license: mit
+library_name: pytorch
+tags:
+  - image-classification
+  - few-shot-learning
+  - prototypical-network
+  - dinov2
+  - semiconductor
+  - defect-detection
+  - vision-transformer
+  - meta-learning
+datasets:
+  - custom
+pipeline_tag: image-classification
+model-index:
+  - name: semiconductor-defect-classifier
+    results:
+      - task:
+          type: image-classification
+          name: Few-Shot Defect Classification
+        metrics:
+          - name: Accuracy (K=1)
+            type: accuracy
+            value: 0.995
+          - name: Accuracy (K=5)
+            type: accuracy
+            value: 0.997
+          - name: Accuracy (K=20)
+            type: accuracy
+            value: 0.998
+          - name: Macro F1 (K=20)
+            type: f1
+            value: 0.999
+---
+# Semiconductor Defect Classifier
+**Few-Shot Semiconductor Wafer Defect Classification using DINOv2 ViT-L/14 + Prototypical Network**
+Built for the **Intel Semiconductor Solutions Challenge 2026**. Classifies grayscale semiconductor wafer microscopy images into 9 categories (8 defect types + good) using as few as 1-5 reference images per class.
+## Model Description
+This model combines a **DINOv2 ViT-L/14** backbone (304M parameters, self-supervised pre-training on 142M images) with a **Prototypical Network** classification head. It was trained using episodic meta-learning on the Intel challenge dataset.
+### Architecture
+```
+Input Image (grayscale, up to 7000x5600)
+    |
+    v
+DINOv2 ViT-L/14 Backbone
+  - 304M parameters (last 6 blocks fine-tuned)
+  - Gradient checkpointing enabled
+  - Output: 1024-dim CLS token
+    |
+    v
+3-Layer Projection Head
+  - Linear(1024, 768) + LayerNorm + GELU
+  - Linear(768, 768) + LayerNorm + GELU
+  - Linear(768, 512) + L2 Normalization
+    |
+    v
+Prototypical Classification
+  - Cosine similarity with learned temperature
+  - Softmax over class prototypes
+  - Good-detection gap threshold (0.20)
+```
+### Key Design Choices
+- **DINOv2 backbone**: Self-supervised features transfer exceptionally well to few-shot tasks, even on out-of-distribution semiconductor images
+- **Prototypical Network**: Non-parametric classifier that works with any number of support examples (K=1 to K=20+) without retraining
+- **Cosine similarity + learned temperature**: More stable than Euclidean distance for high-dimensional embeddings
+- **Differential learning rates**: Backbone fine-tuned at 5e-6, projection head at 3e-4 (60x ratio)
+- **Gradient checkpointing**: Reduces VRAM from ~24 GB to ~2 GB with minimal speed penalty
+## Training Details
+### Dataset
+Intel Semiconductor Solutions Challenge 2026 dataset:
+| Class | Name | Samples | Description |
+|-------|------|---------|-------------|
+| 0 | Good | 7,135 | Non-defective wafer surface |
+| 1 | Defect 1 | 253 | Scratch-type defect |
+| 2 | Defect 2 | 178 | Particle contamination |
+| 3 | Defect 3 | 9 | Micro-crack (extremely rare) |
+| 4 | Defect 4 | 14 | Edge defect (extremely rare) |
+| 5 | Defect 5 | 411 | Pattern anomaly |
+| 8 | Defect 8 | 803 | Surface roughness |
+| 9 | Defect 9 | 319 | Deposition defect |
+| 10 | Defect 10 | 674 | Etch residue |
+**Note**: Classes 6 and 7 do not exist in the dataset. The extreme class imbalance (793:1 ratio between good and defect3) and visually similar class pairs (defect3/defect9 at 0.963 cosine similarity, defect4/defect8 at 0.889) make this a challenging benchmark.
+### Training Configuration
+| Parameter | Value |
+|-----------|-------|
+| Training paradigm | Episodic meta-learning |
+| Episodes per epoch | 500 |
+| Episode structure | 9-way 5-shot 10-query |
+| Optimizer | AdamW |
+| Learning rate (head) | 3.0e-4 |
+| Learning rate (backbone) | 5.0e-6 |
+| LR schedule | Cosine annealing with 5-epoch warmup |
+| Weight decay | 1.0e-4 |
+| Label smoothing | 0.1 |
+| Gradient clipping | Max norm 1.0 |
+| Mixed precision | AMP (float16) |
+| Batch processing | Gradient checkpointing |
+| Early stopping | Patience 20 epochs |
+| Input resolution | 518x518 (DINOv2 native) |
+| Preprocessing | LongestMaxSize + PadIfNeeded (aspect-ratio preserving) |
+### Training Hardware
+- **GPU**: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (95.6 GB VRAM)
+- **Actual VRAM usage**: ~2 GB (gradient checkpointing)
+- **Training time**: ~17 minutes/epoch
+- **Convergence**: 7 epochs (early stopping triggered at epoch 27)
+## Performance
+### K-Shot Classification Accuracy
+| K (support images per class) | Accuracy |
+|------------------------------|----------|
+| K=1 | 99.5% |
+| K=3 | 99.7% |
+| K=5 | 99.7% |
+| K=10 | 99.7% |
+| K=20 | 99.8% |
+### Per-Class F1 Scores (K=20)
+| Class | F1 Score |
+|-------|----------|
+| Defect 1 (Scratch) | 1.000 |
+| Defect 2 (Particle) | 1.000 |
+| Defect 3 (Micro-crack) | 1.000 |
+| Defect 4 (Edge) | 1.000 |
+| Defect 5 (Pattern) | 0.994 |
+| Defect 8 (Roughness) | 1.000 |
+| Defect 9 (Deposition) | 1.000 |
+| Defect 10 (Etch residue) | 0.996 |
+**Balanced accuracy (K=20)**: 0.999
+**Macro F1 (K=20)**: 0.999
+### Good Image Detection
+The model includes a cosine similarity gap threshold for detecting non-defective ("good") wafer images:
+| Metric | Value |
+|--------|-------|
+| Good image accuracy | ~90% |
+| Defect image accuracy | ~97% |
+| Gap threshold | 0.20 |
+## How to Use
+### Quick Start
+```python
+import torch
+import yaml
+from PIL import Image
+from problem_a.src.backbone import get_backbone
+from problem_a.src.protonet import PrototypicalNetwork, IncrementalPrototypeTracker
+from problem_a.src.augmentations import get_eval_transform
+# Load model
+with open('problem_a/configs/default.yaml') as f:
+    cfg = yaml.safe_load(f)
+backbone = get_backbone(cfg['model']['backbone'], cfg['model']['backbone_size'])
+model = PrototypicalNetwork(backbone, cfg['model']['proj_hidden'], cfg['model']['proj_dim'])
+checkpoint = torch.load('best_model.pt', map_location='cpu', weights_only=False)
+model.load_state_dict(checkpoint['model_state_dict'])
+model.eval().cuda()
+transform = get_eval_transform(cfg['data']['img_size'])
+# Create tracker and add support images
+tracker = IncrementalPrototypeTracker(model, torch.device('cuda'))
+# Add support images (at least 1 per class)
+for class_id, image_path in support_images:
+    img = Image.open(image_path).convert('L')
+    tensor = transform(img)
+    tracker.add_example(tensor, class_id)
+# Classify a query image
+query_img = Image.open('query.png').convert('L')
+query_tensor = transform(query_img).unsqueeze(0).cuda()
+with torch.no_grad():
+    log_probs = model.classify(query_tensor, tracker.prototypes)
+    probs = torch.exp(log_probs).squeeze(0)
+# Get prediction
+label_map = tracker.label_map
+reverse_map = {v: k for k, v in label_map.items()}
+pred_idx = probs.argmax().item()
+predicted_class = reverse_map[pred_idx]
+confidence = probs[pred_idx].item()
+print(f'Predicted: class {predicted_class}, confidence: {confidence:.3f}')
+```
+### Download with huggingface_hub
+```python
+from huggingface_hub import hf_hub_download
+checkpoint_path = hf_hub_download(
+    repo_id="Makatia/semiconductor-defect-classifier",
+    filename="best_model.pt"
+)
+```
+## Model Specifications
+| Property | Value |
+|----------|-------|
+| Architecture | DINOv2 ViT-L/14 + Prototypical Network |
+| Total parameters | 306,142,209 |
+| Trainable parameters | 77,366,273 (25.3%) |
+| Backbone | DINOv2 ViT-L/14 (frozen + last 6 blocks) |
+| Embedding dimension | 512 (L2-normalized) |
+| Projection head | 1024 -> 768 -> 768 -> 512 |
+| Input size | 518x518 (aspect-ratio preserved with padding) |
+| Input channels | Grayscale (converted to 3-channel internally) |
+| Inference time | ~700ms (GPU) / ~3s (CPU) |
+| VRAM (inference) | ~2 GB |
+| Checkpoint size | 1.17 GB |
+| Framework | PyTorch 2.0+ |
+| Dependencies | timm >= 1.0, albumentations >= 1.3 |
+## Checkpoint Contents
+The `.pt` file contains:
+```python
+{
+    'epoch': 7,                    # Best epoch
+    'model_state_dict': {...},     # Full model weights
+    'best_val_acc': 0.906,         # Validation accuracy (episodic)
+    'config': {...},               # Training configuration
+}
+```
+## Intended Use
+- **Primary use**: Semiconductor wafer defect detection and classification in manufacturing quality control
+- **Few-shot scenarios**: When only 1-20 labeled examples per defect class are available
+- **Research**: Few-shot learning, meta-learning, and industrial defect detection benchmarks
+## Limitations
+- Trained specifically on Intel challenge semiconductor images; may need fine-tuning for other semiconductor processes
+- Good image detection (~90% accuracy) is less reliable than defect classification (97-100%)
+- Requires grayscale input images; color images should be converted before inference
+- Extremely rare classes (defect3: 9 samples, defect4: 14 samples) have lower representation in training
+## Source Code
+Full training pipeline, evaluation scripts, and PySide6/QML desktop application available at:
+[github.com/fidel-makatia/Semiconductor_Defect_Classification_model](https://github.com/fidel-makatia/Semiconductor_Defect_Classification_model)
+## Citation
+```bibtex
+@misc{makatia2026semiconductor,
+  title={Few-Shot Semiconductor Defect Classification with DINOv2 and Prototypical Networks},
+  author={Fidel Makatia},
+  year={2026},
+  howpublished={Intel Semiconductor Solutions Challenge 2026},
+}
+```
+## License
+MIT License