| --- |
| language: en |
| license: mit |
| library_name: pytorch |
| tags: |
| - image-classification |
| - few-shot-learning |
| - prototypical-network |
| - dinov2 |
| - semiconductor |
| - defect-detection |
| - vision-transformer |
| - meta-learning |
| datasets: |
| - custom |
| pipeline_tag: image-classification |
| model-index: |
| - name: semiconductor-defect-classifier |
| results: |
| - task: |
| type: image-classification |
| name: Few-Shot Defect Classification |
| metrics: |
| - name: Accuracy (K=1) |
| type: accuracy |
| value: 0.995 |
| - name: Accuracy (K=5) |
| type: accuracy |
| value: 0.997 |
| - name: Accuracy (K=20) |
| type: accuracy |
| value: 0.998 |
| - name: Macro F1 (K=20) |
| type: f1 |
| value: 0.999 |
| --- |
| |
| # Semiconductor Defect Classifier |
|
|
| **Few-Shot Semiconductor Wafer Defect Classification using DINOv2 ViT-L/14 + Prototypical Network** |
|
|
| Built for the **Intel Semiconductor Solutions Challenge 2026**. Classifies grayscale semiconductor wafer microscopy images into 9 categories (8 defect types + good) using as few as 1-5 reference images per class. |
|
|
| ## Model Description |
|
|
| This model combines a **DINOv2 ViT-L/14** backbone (304M parameters, self-supervised pre-training on 142M images) with a **Prototypical Network** classification head. It was trained using episodic meta-learning on the Intel challenge dataset. |
|
|
| ### Architecture |
|
|
| ``` |
| Input Image (grayscale, up to 7000x5600) |
| | |
| v |
| DINOv2 ViT-L/14 Backbone |
| - 304M parameters (last 6 blocks fine-tuned) |
| - Gradient checkpointing enabled |
| - Output: 1024-dim CLS token |
| | |
| v |
| 3-Layer Projection Head |
| - Linear(1024, 768) + LayerNorm + GELU |
| - Linear(768, 768) + LayerNorm + GELU |
| - Linear(768, 512) + L2 Normalization |
| | |
| v |
| Prototypical Classification |
| - Cosine similarity with learned temperature |
| - Softmax over class prototypes |
| - Good-detection gap threshold (0.20) |
| ``` |
|
|
| ### Key Design Choices |
|
|
| - **DINOv2 backbone**: Self-supervised features transfer exceptionally well to few-shot tasks, even on out-of-distribution semiconductor images |
| - **Prototypical Network**: Non-parametric classifier that works with any number of support examples (K=1 to K=20+) without retraining |
| - **Cosine similarity + learned temperature**: More stable than Euclidean distance for high-dimensional embeddings |
| - **Differential learning rates**: Backbone fine-tuned at 5e-6, projection head at 3e-4 (60x ratio) |
| - **Gradient checkpointing**: Reduces VRAM from ~24 GB to ~2 GB with minimal speed penalty |
|
|
| ## Training Details |
|
|
| ### Dataset |
|
|
| Intel Semiconductor Solutions Challenge 2026 dataset: |
|
|
| | Class | Name | Samples | Description | |
| |-------|------|---------|-------------| |
| | 0 | Good | 7,135 | Non-defective wafer surface | |
| | 1 | Defect 1 | 253 | Scratch-type defect | |
| | 2 | Defect 2 | 178 | Particle contamination | |
| | 3 | Defect 3 | 9 | Micro-crack (extremely rare) | |
| | 4 | Defect 4 | 14 | Edge defect (extremely rare) | |
| | 5 | Defect 5 | 411 | Pattern anomaly | |
| | 8 | Defect 8 | 803 | Surface roughness | |
| | 9 | Defect 9 | 319 | Deposition defect | |
| | 10 | Defect 10 | 674 | Etch residue | |
|
|
| **Note**: Classes 6 and 7 do not exist in the dataset. The extreme class imbalance (793:1 ratio between good and defect3) and visually similar class pairs (defect3/defect9 at 0.963 cosine similarity, defect4/defect8 at 0.889) make this a challenging benchmark. |
|
|
| ### Training Configuration |
|
|
| | Parameter | Value | |
| |-----------|-------| |
| | Training paradigm | Episodic meta-learning | |
| | Episodes per epoch | 500 | |
| | Episode structure | 9-way 5-shot 10-query | |
| | Optimizer | AdamW | |
| | Learning rate (head) | 3.0e-4 | |
| | Learning rate (backbone) | 5.0e-6 | |
| | LR schedule | Cosine annealing with 5-epoch warmup | |
| | Weight decay | 1.0e-4 | |
| | Label smoothing | 0.1 | |
| | Gradient clipping | Max norm 1.0 | |
| | Mixed precision | AMP (float16) | |
| | Batch processing | Gradient checkpointing | |
| | Early stopping | Patience 20 epochs | |
| | Input resolution | 518x518 (DINOv2 native) | |
| | Preprocessing | LongestMaxSize + PadIfNeeded (aspect-ratio preserving) | |
|
|
| ### Training Hardware |
|
|
| - **GPU**: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (95.6 GB VRAM) |
| - **Actual VRAM usage**: ~2 GB (gradient checkpointing) |
| - **Training time**: ~17 minutes/epoch |
| - **Convergence**: 7 epochs (early stopping triggered at epoch 27) |
|
|
| ## Performance |
|
|
| ### K-Shot Classification Accuracy |
|
|
| | K (support images per class) | Accuracy | |
| |------------------------------|----------| |
| | K=1 | 99.5% | |
| | K=3 | 99.7% | |
| | K=5 | 99.7% | |
| | K=10 | 99.7% | |
| | K=20 | 99.8% | |
|
|
| ### Per-Class F1 Scores (K=20) |
|
|
| | Class | F1 Score | |
| |-------|----------| |
| | Defect 1 (Scratch) | 1.000 | |
| | Defect 2 (Particle) | 1.000 | |
| | Defect 3 (Micro-crack) | 1.000 | |
| | Defect 4 (Edge) | 1.000 | |
| | Defect 5 (Pattern) | 0.994 | |
| | Defect 8 (Roughness) | 1.000 | |
| | Defect 9 (Deposition) | 1.000 | |
| | Defect 10 (Etch residue) | 0.996 | |
|
|
| **Balanced accuracy (K=20)**: 0.999 |
| **Macro F1 (K=20)**: 0.999 |
|
|
| ### Good Image Detection |
|
|
| The model includes a cosine similarity gap threshold for detecting non-defective ("good") wafer images: |
|
|
| | Metric | Value | |
| |--------|-------| |
| | Good image accuracy | ~90% | |
| | Defect image accuracy | ~97% | |
| | Gap threshold | 0.20 | |
|
|
| ## How to Use |
|
|
| ### Quick Start |
|
|
| ```python |
| import torch |
| import yaml |
| from PIL import Image |
| from problem_a.src.backbone import get_backbone |
| from problem_a.src.protonet import PrototypicalNetwork, IncrementalPrototypeTracker |
| from problem_a.src.augmentations import get_eval_transform |
| |
| # Load model |
| with open('problem_a/configs/default.yaml') as f: |
| cfg = yaml.safe_load(f) |
| |
| backbone = get_backbone(cfg['model']['backbone'], cfg['model']['backbone_size']) |
| model = PrototypicalNetwork(backbone, cfg['model']['proj_hidden'], cfg['model']['proj_dim']) |
| |
| checkpoint = torch.load('best_model.pt', map_location='cpu', weights_only=False) |
| model.load_state_dict(checkpoint['model_state_dict']) |
| model.eval().cuda() |
| |
| transform = get_eval_transform(cfg['data']['img_size']) |
| |
| # Create tracker and add support images |
| tracker = IncrementalPrototypeTracker(model, torch.device('cuda')) |
| |
| # Add support images (at least 1 per class) |
| for class_id, image_path in support_images: |
| img = Image.open(image_path).convert('L') |
| tensor = transform(img) |
| tracker.add_example(tensor, class_id) |
| |
| # Classify a query image |
| query_img = Image.open('query.png').convert('L') |
| query_tensor = transform(query_img).unsqueeze(0).cuda() |
| |
| with torch.no_grad(): |
| log_probs = model.classify(query_tensor, tracker.prototypes) |
| probs = torch.exp(log_probs).squeeze(0) |
| |
| # Get prediction |
| label_map = tracker.label_map |
| reverse_map = {v: k for k, v in label_map.items()} |
| pred_idx = probs.argmax().item() |
| predicted_class = reverse_map[pred_idx] |
| confidence = probs[pred_idx].item() |
| print(f'Predicted: class {predicted_class}, confidence: {confidence:.3f}') |
| ``` |
|
|
| ### Download with huggingface_hub |
| |
| ```python |
| from huggingface_hub import hf_hub_download |
|
|
| checkpoint_path = hf_hub_download( |
| repo_id="Makatia/semiconductor-defect-classifier", |
| filename="best_model.pt" |
| ) |
| ``` |
| |
| ## Model Specifications |
|
|
| | Property | Value | |
| |----------|-------| |
| | Architecture | DINOv2 ViT-L/14 + Prototypical Network | |
| | Total parameters | 306,142,209 | |
| | Trainable parameters | 77,366,273 (25.3%) | |
| | Backbone | DINOv2 ViT-L/14 (frozen + last 6 blocks) | |
| | Embedding dimension | 512 (L2-normalized) | |
| | Projection head | 1024 -> 768 -> 768 -> 512 | |
| | Input size | 518x518 (aspect-ratio preserved with padding) | |
| | Input channels | Grayscale (converted to 3-channel internally) | |
| | Inference time | ~700ms (GPU) / ~3s (CPU) | |
| | VRAM (inference) | ~2 GB | |
| | Checkpoint size | 1.17 GB | |
| | Framework | PyTorch 2.0+ | |
| | Dependencies | timm >= 1.0, albumentations >= 1.3 | |
|
|
| ## Checkpoint Contents |
|
|
| The `.pt` file contains: |
|
|
| ```python |
| { |
| 'epoch': 7, # Best epoch |
| 'model_state_dict': {...}, # Full model weights |
| 'best_val_acc': 0.906, # Validation accuracy (episodic) |
| 'config': {...}, # Training configuration |
| } |
| ``` |
|
|
| ## Intended Use |
|
|
| - **Primary use**: Semiconductor wafer defect detection and classification in manufacturing quality control |
| - **Few-shot scenarios**: When only 1-20 labeled examples per defect class are available |
| - **Research**: Few-shot learning, meta-learning, and industrial defect detection benchmarks |
|
|
| ## Limitations |
|
|
| - Trained specifically on Intel challenge semiconductor images; may need fine-tuning for other semiconductor processes |
| - Good image detection (~90% accuracy) is less reliable than defect classification (97-100%) |
| - Requires grayscale input images; color images should be converted before inference |
| - Extremely rare classes (defect3: 9 samples, defect4: 14 samples) have lower representation in training |
|
|
| ## Source Code |
|
|
| Full training pipeline, evaluation scripts, and PySide6/QML desktop application available at: |
| [github.com/fidel-makatia/Semiconductor_Defect_Classification_model](https://github.com/fidel-makatia/Semiconductor_Defect_Classification_model) |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{makatia2026semiconductor, |
| title={Few-Shot Semiconductor Defect Classification with DINOv2 and Prototypical Networks}, |
| author={Fidel Makatia}, |
| year={2026}, |
| howpublished={Intel Semiconductor Solutions Challenge 2026}, |
| } |
| ``` |
|
|
| ## License |
|
|
| MIT License |
|
|