--- language: en license: mit library_name: pytorch tags: - image-classification - few-shot-learning - prototypical-network - dinov2 - semiconductor - defect-detection - vision-transformer - meta-learning datasets: - custom pipeline_tag: image-classification model-index: - name: semiconductor-defect-classifier results: - task: type: image-classification name: Few-Shot Defect Classification metrics: - name: Accuracy (K=1) type: accuracy value: 0.995 - name: Accuracy (K=5) type: accuracy value: 0.997 - name: Accuracy (K=20) type: accuracy value: 0.998 - name: Macro F1 (K=20) type: f1 value: 0.999 --- # Semiconductor Defect Classifier **Few-Shot Semiconductor Wafer Defect Classification using DINOv2 ViT-L/14 + Prototypical Network** Built for the **Intel Semiconductor Solutions Challenge 2026**. Classifies grayscale semiconductor wafer microscopy images into 9 categories (8 defect types + good) using as few as 1-5 reference images per class. ## Model Description This model combines a **DINOv2 ViT-L/14** backbone (304M parameters, self-supervised pre-training on 142M images) with a **Prototypical Network** classification head. It was trained using episodic meta-learning on the Intel challenge dataset. ### Architecture ``` Input Image (grayscale, up to 7000x5600) | v DINOv2 ViT-L/14 Backbone - 304M parameters (last 6 blocks fine-tuned) - Gradient checkpointing enabled - Output: 1024-dim CLS token | v 3-Layer Projection Head - Linear(1024, 768) + LayerNorm + GELU - Linear(768, 768) + LayerNorm + GELU - Linear(768, 512) + L2 Normalization | v Prototypical Classification - Cosine similarity with learned temperature - Softmax over class prototypes - Good-detection gap threshold (0.20) ``` ### Key Design Choices - **DINOv2 backbone**: Self-supervised features transfer exceptionally well to few-shot tasks, even on out-of-distribution semiconductor images - **Prototypical Network**: Non-parametric classifier that works with any number of support examples (K=1 to K=20+) without retraining - **Cosine similarity + learned temperature**: More stable than Euclidean distance for high-dimensional embeddings - **Differential learning rates**: Backbone fine-tuned at 5e-6, projection head at 3e-4 (60x ratio) - **Gradient checkpointing**: Reduces VRAM from ~24 GB to ~2 GB with minimal speed penalty ## Training Details ### Dataset Intel Semiconductor Solutions Challenge 2026 dataset: | Class | Name | Samples | Description | |-------|------|---------|-------------| | 0 | Good | 7,135 | Non-defective wafer surface | | 1 | Defect 1 | 253 | Scratch-type defect | | 2 | Defect 2 | 178 | Particle contamination | | 3 | Defect 3 | 9 | Micro-crack (extremely rare) | | 4 | Defect 4 | 14 | Edge defect (extremely rare) | | 5 | Defect 5 | 411 | Pattern anomaly | | 8 | Defect 8 | 803 | Surface roughness | | 9 | Defect 9 | 319 | Deposition defect | | 10 | Defect 10 | 674 | Etch residue | **Note**: Classes 6 and 7 do not exist in the dataset. The extreme class imbalance (793:1 ratio between good and defect3) and visually similar class pairs (defect3/defect9 at 0.963 cosine similarity, defect4/defect8 at 0.889) make this a challenging benchmark. ### Training Configuration | Parameter | Value | |-----------|-------| | Training paradigm | Episodic meta-learning | | Episodes per epoch | 500 | | Episode structure | 9-way 5-shot 10-query | | Optimizer | AdamW | | Learning rate (head) | 3.0e-4 | | Learning rate (backbone) | 5.0e-6 | | LR schedule | Cosine annealing with 5-epoch warmup | | Weight decay | 1.0e-4 | | Label smoothing | 0.1 | | Gradient clipping | Max norm 1.0 | | Mixed precision | AMP (float16) | | Batch processing | Gradient checkpointing | | Early stopping | Patience 20 epochs | | Input resolution | 518x518 (DINOv2 native) | | Preprocessing | LongestMaxSize + PadIfNeeded (aspect-ratio preserving) | ### Training Hardware - **GPU**: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (95.6 GB VRAM) - **Actual VRAM usage**: ~2 GB (gradient checkpointing) - **Training time**: ~17 minutes/epoch - **Convergence**: 7 epochs (early stopping triggered at epoch 27) ## Performance ### K-Shot Classification Accuracy | K (support images per class) | Accuracy | |------------------------------|----------| | K=1 | 99.5% | | K=3 | 99.7% | | K=5 | 99.7% | | K=10 | 99.7% | | K=20 | 99.8% | ### Per-Class F1 Scores (K=20) | Class | F1 Score | |-------|----------| | Defect 1 (Scratch) | 1.000 | | Defect 2 (Particle) | 1.000 | | Defect 3 (Micro-crack) | 1.000 | | Defect 4 (Edge) | 1.000 | | Defect 5 (Pattern) | 0.994 | | Defect 8 (Roughness) | 1.000 | | Defect 9 (Deposition) | 1.000 | | Defect 10 (Etch residue) | 0.996 | **Balanced accuracy (K=20)**: 0.999 **Macro F1 (K=20)**: 0.999 ### Good Image Detection The model includes a cosine similarity gap threshold for detecting non-defective ("good") wafer images: | Metric | Value | |--------|-------| | Good image accuracy | ~90% | | Defect image accuracy | ~97% | | Gap threshold | 0.20 | ## How to Use ### Quick Start ```python import torch import yaml from PIL import Image from problem_a.src.backbone import get_backbone from problem_a.src.protonet import PrototypicalNetwork, IncrementalPrototypeTracker from problem_a.src.augmentations import get_eval_transform # Load model with open('problem_a/configs/default.yaml') as f: cfg = yaml.safe_load(f) backbone = get_backbone(cfg['model']['backbone'], cfg['model']['backbone_size']) model = PrototypicalNetwork(backbone, cfg['model']['proj_hidden'], cfg['model']['proj_dim']) checkpoint = torch.load('best_model.pt', map_location='cpu', weights_only=False) model.load_state_dict(checkpoint['model_state_dict']) model.eval().cuda() transform = get_eval_transform(cfg['data']['img_size']) # Create tracker and add support images tracker = IncrementalPrototypeTracker(model, torch.device('cuda')) # Add support images (at least 1 per class) for class_id, image_path in support_images: img = Image.open(image_path).convert('L') tensor = transform(img) tracker.add_example(tensor, class_id) # Classify a query image query_img = Image.open('query.png').convert('L') query_tensor = transform(query_img).unsqueeze(0).cuda() with torch.no_grad(): log_probs = model.classify(query_tensor, tracker.prototypes) probs = torch.exp(log_probs).squeeze(0) # Get prediction label_map = tracker.label_map reverse_map = {v: k for k, v in label_map.items()} pred_idx = probs.argmax().item() predicted_class = reverse_map[pred_idx] confidence = probs[pred_idx].item() print(f'Predicted: class {predicted_class}, confidence: {confidence:.3f}') ``` ### Download with huggingface_hub ```python from huggingface_hub import hf_hub_download checkpoint_path = hf_hub_download( repo_id="Makatia/semiconductor-defect-classifier", filename="best_model.pt" ) ``` ## Model Specifications | Property | Value | |----------|-------| | Architecture | DINOv2 ViT-L/14 + Prototypical Network | | Total parameters | 306,142,209 | | Trainable parameters | 77,366,273 (25.3%) | | Backbone | DINOv2 ViT-L/14 (frozen + last 6 blocks) | | Embedding dimension | 512 (L2-normalized) | | Projection head | 1024 -> 768 -> 768 -> 512 | | Input size | 518x518 (aspect-ratio preserved with padding) | | Input channels | Grayscale (converted to 3-channel internally) | | Inference time | ~700ms (GPU) / ~3s (CPU) | | VRAM (inference) | ~2 GB | | Checkpoint size | 1.17 GB | | Framework | PyTorch 2.0+ | | Dependencies | timm >= 1.0, albumentations >= 1.3 | ## Checkpoint Contents The `.pt` file contains: ```python { 'epoch': 7, # Best epoch 'model_state_dict': {...}, # Full model weights 'best_val_acc': 0.906, # Validation accuracy (episodic) 'config': {...}, # Training configuration } ``` ## Intended Use - **Primary use**: Semiconductor wafer defect detection and classification in manufacturing quality control - **Few-shot scenarios**: When only 1-20 labeled examples per defect class are available - **Research**: Few-shot learning, meta-learning, and industrial defect detection benchmarks ## Limitations - Trained specifically on Intel challenge semiconductor images; may need fine-tuning for other semiconductor processes - Good image detection (~90% accuracy) is less reliable than defect classification (97-100%) - Requires grayscale input images; color images should be converted before inference - Extremely rare classes (defect3: 9 samples, defect4: 14 samples) have lower representation in training ## Source Code Full training pipeline, evaluation scripts, and PySide6/QML desktop application available at: [github.com/fidel-makatia/Semiconductor_Defect_Classification_model](https://github.com/fidel-makatia/Semiconductor_Defect_Classification_model) ## Citation ```bibtex @misc{makatia2026semiconductor, title={Few-Shot Semiconductor Defect Classification with DINOv2 and Prototypical Networks}, author={Fidel Makatia}, year={2026}, howpublished={Intel Semiconductor Solutions Challenge 2026}, } ``` ## License MIT License