Makatia's picture
Upload README.md with huggingface_hub
e0060ae verified
metadata
language: en
license: mit
library_name: pytorch
tags:
  - image-classification
  - few-shot-learning
  - prototypical-network
  - dinov2
  - semiconductor
  - defect-detection
  - vision-transformer
  - meta-learning
datasets:
  - custom
pipeline_tag: image-classification
model-index:
  - name: semiconductor-defect-classifier
    results:
      - task:
          type: image-classification
          name: Few-Shot Defect Classification
        metrics:
          - name: Accuracy (K=1)
            type: accuracy
            value: 0.995
          - name: Accuracy (K=5)
            type: accuracy
            value: 0.997
          - name: Accuracy (K=20)
            type: accuracy
            value: 0.998
          - name: Macro F1 (K=20)
            type: f1
            value: 0.999

Semiconductor Defect Classifier

Few-Shot Semiconductor Wafer Defect Classification using DINOv2 ViT-L/14 + Prototypical Network

Built for the Intel Semiconductor Solutions Challenge 2026. Classifies grayscale semiconductor wafer microscopy images into 9 categories (8 defect types + good) using as few as 1-5 reference images per class.

Model Description

This model combines a DINOv2 ViT-L/14 backbone (304M parameters, self-supervised pre-training on 142M images) with a Prototypical Network classification head. It was trained using episodic meta-learning on the Intel challenge dataset.

Architecture

Input Image (grayscale, up to 7000x5600)
    |
    v
DINOv2 ViT-L/14 Backbone
  - 304M parameters (last 6 blocks fine-tuned)
  - Gradient checkpointing enabled
  - Output: 1024-dim CLS token
    |
    v
3-Layer Projection Head
  - Linear(1024, 768) + LayerNorm + GELU
  - Linear(768, 768) + LayerNorm + GELU
  - Linear(768, 512) + L2 Normalization
    |
    v
Prototypical Classification
  - Cosine similarity with learned temperature
  - Softmax over class prototypes
  - Good-detection gap threshold (0.20)

Key Design Choices

  • DINOv2 backbone: Self-supervised features transfer exceptionally well to few-shot tasks, even on out-of-distribution semiconductor images
  • Prototypical Network: Non-parametric classifier that works with any number of support examples (K=1 to K=20+) without retraining
  • Cosine similarity + learned temperature: More stable than Euclidean distance for high-dimensional embeddings
  • Differential learning rates: Backbone fine-tuned at 5e-6, projection head at 3e-4 (60x ratio)
  • Gradient checkpointing: Reduces VRAM from ~24 GB to ~2 GB with minimal speed penalty

Training Details

Dataset

Intel Semiconductor Solutions Challenge 2026 dataset:

Class Name Samples Description
0 Good 7,135 Non-defective wafer surface
1 Defect 1 253 Scratch-type defect
2 Defect 2 178 Particle contamination
3 Defect 3 9 Micro-crack (extremely rare)
4 Defect 4 14 Edge defect (extremely rare)
5 Defect 5 411 Pattern anomaly
8 Defect 8 803 Surface roughness
9 Defect 9 319 Deposition defect
10 Defect 10 674 Etch residue

Note: Classes 6 and 7 do not exist in the dataset. The extreme class imbalance (793:1 ratio between good and defect3) and visually similar class pairs (defect3/defect9 at 0.963 cosine similarity, defect4/defect8 at 0.889) make this a challenging benchmark.

Training Configuration

Parameter Value
Training paradigm Episodic meta-learning
Episodes per epoch 500
Episode structure 9-way 5-shot 10-query
Optimizer AdamW
Learning rate (head) 3.0e-4
Learning rate (backbone) 5.0e-6
LR schedule Cosine annealing with 5-epoch warmup
Weight decay 1.0e-4
Label smoothing 0.1
Gradient clipping Max norm 1.0
Mixed precision AMP (float16)
Batch processing Gradient checkpointing
Early stopping Patience 20 epochs
Input resolution 518x518 (DINOv2 native)
Preprocessing LongestMaxSize + PadIfNeeded (aspect-ratio preserving)

Training Hardware

  • GPU: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (95.6 GB VRAM)
  • Actual VRAM usage: ~2 GB (gradient checkpointing)
  • Training time: ~17 minutes/epoch
  • Convergence: 7 epochs (early stopping triggered at epoch 27)

Performance

K-Shot Classification Accuracy

K (support images per class) Accuracy
K=1 99.5%
K=3 99.7%
K=5 99.7%
K=10 99.7%
K=20 99.8%

Per-Class F1 Scores (K=20)

Class F1 Score
Defect 1 (Scratch) 1.000
Defect 2 (Particle) 1.000
Defect 3 (Micro-crack) 1.000
Defect 4 (Edge) 1.000
Defect 5 (Pattern) 0.994
Defect 8 (Roughness) 1.000
Defect 9 (Deposition) 1.000
Defect 10 (Etch residue) 0.996

Balanced accuracy (K=20): 0.999 Macro F1 (K=20): 0.999

Good Image Detection

The model includes a cosine similarity gap threshold for detecting non-defective ("good") wafer images:

Metric Value
Good image accuracy ~90%
Defect image accuracy ~97%
Gap threshold 0.20

How to Use

Quick Start

import torch
import yaml
from PIL import Image
from problem_a.src.backbone import get_backbone
from problem_a.src.protonet import PrototypicalNetwork, IncrementalPrototypeTracker
from problem_a.src.augmentations import get_eval_transform

# Load model
with open('problem_a/configs/default.yaml') as f:
    cfg = yaml.safe_load(f)

backbone = get_backbone(cfg['model']['backbone'], cfg['model']['backbone_size'])
model = PrototypicalNetwork(backbone, cfg['model']['proj_hidden'], cfg['model']['proj_dim'])

checkpoint = torch.load('best_model.pt', map_location='cpu', weights_only=False)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval().cuda()

transform = get_eval_transform(cfg['data']['img_size'])

# Create tracker and add support images
tracker = IncrementalPrototypeTracker(model, torch.device('cuda'))

# Add support images (at least 1 per class)
for class_id, image_path in support_images:
    img = Image.open(image_path).convert('L')
    tensor = transform(img)
    tracker.add_example(tensor, class_id)

# Classify a query image
query_img = Image.open('query.png').convert('L')
query_tensor = transform(query_img).unsqueeze(0).cuda()

with torch.no_grad():
    log_probs = model.classify(query_tensor, tracker.prototypes)
    probs = torch.exp(log_probs).squeeze(0)

# Get prediction
label_map = tracker.label_map
reverse_map = {v: k for k, v in label_map.items()}
pred_idx = probs.argmax().item()
predicted_class = reverse_map[pred_idx]
confidence = probs[pred_idx].item()
print(f'Predicted: class {predicted_class}, confidence: {confidence:.3f}')

Download with huggingface_hub

from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download(
    repo_id="Makatia/semiconductor-defect-classifier",
    filename="best_model.pt"
)

Model Specifications

Property Value
Architecture DINOv2 ViT-L/14 + Prototypical Network
Total parameters 306,142,209
Trainable parameters 77,366,273 (25.3%)
Backbone DINOv2 ViT-L/14 (frozen + last 6 blocks)
Embedding dimension 512 (L2-normalized)
Projection head 1024 -> 768 -> 768 -> 512
Input size 518x518 (aspect-ratio preserved with padding)
Input channels Grayscale (converted to 3-channel internally)
Inference time ~700ms (GPU) / ~3s (CPU)
VRAM (inference) ~2 GB
Checkpoint size 1.17 GB
Framework PyTorch 2.0+
Dependencies timm >= 1.0, albumentations >= 1.3

Checkpoint Contents

The .pt file contains:

{
    'epoch': 7,                    # Best epoch
    'model_state_dict': {...},     # Full model weights
    'best_val_acc': 0.906,         # Validation accuracy (episodic)
    'config': {...},               # Training configuration
}

Intended Use

  • Primary use: Semiconductor wafer defect detection and classification in manufacturing quality control
  • Few-shot scenarios: When only 1-20 labeled examples per defect class are available
  • Research: Few-shot learning, meta-learning, and industrial defect detection benchmarks

Limitations

  • Trained specifically on Intel challenge semiconductor images; may need fine-tuning for other semiconductor processes
  • Good image detection (~90% accuracy) is less reliable than defect classification (97-100%)
  • Requires grayscale input images; color images should be converted before inference
  • Extremely rare classes (defect3: 9 samples, defect4: 14 samples) have lower representation in training

Source Code

Full training pipeline, evaluation scripts, and PySide6/QML desktop application available at: github.com/fidel-makatia/Semiconductor_Defect_Classification_model

Citation

@misc{makatia2026semiconductor,
  title={Few-Shot Semiconductor Defect Classification with DINOv2 and Prototypical Networks},
  author={Fidel Makatia},
  year={2026},
  howpublished={Intel Semiconductor Solutions Challenge 2026},
}

License

MIT License