Upload README.md with huggingface_hub

e0060ae verified about 17 hours ago

9.32 kB

	---
	language: en
	license: mit
	library_name: pytorch
	tags:
	- image-classification
	- few-shot-learning
	- prototypical-network
	- dinov2
	- semiconductor
	- defect-detection
	- vision-transformer
	- meta-learning
	datasets:
	- custom
	pipeline_tag: image-classification
	model-index:
	- name: semiconductor-defect-classifier
	results:
	- task:
	type: image-classification
	name: Few-Shot Defect Classification
	metrics:
	- name: Accuracy (K=1)
	type: accuracy
	value: 0.995
	- name: Accuracy (K=5)
	type: accuracy
	value: 0.997
	- name: Accuracy (K=20)
	type: accuracy
	value: 0.998
	- name: Macro F1 (K=20)
	type: f1
	value: 0.999
	---

	# Semiconductor Defect Classifier

	Few-Shot Semiconductor Wafer Defect Classification using DINOv2 ViT-L/14 + Prototypical Network

	Built for the Intel Semiconductor Solutions Challenge 2026. Classifies grayscale semiconductor wafer microscopy images into 9 categories (8 defect types + good) using as few as 1-5 reference images per class.

	## Model Description

	This model combines a DINOv2 ViT-L/14 backbone (304M parameters, self-supervised pre-training on 142M images) with a Prototypical Network classification head. It was trained using episodic meta-learning on the Intel challenge dataset.

	### Architecture

	```
	Input Image (grayscale, up to 7000x5600)
	\|
	v
	DINOv2 ViT-L/14 Backbone
	- 304M parameters (last 6 blocks fine-tuned)
	- Gradient checkpointing enabled
	- Output: 1024-dim CLS token
	\|
	v
	3-Layer Projection Head
	- Linear(1024, 768) + LayerNorm + GELU
	- Linear(768, 768) + LayerNorm + GELU
	- Linear(768, 512) + L2 Normalization
	\|
	v
	Prototypical Classification
	- Cosine similarity with learned temperature
	- Softmax over class prototypes
	- Good-detection gap threshold (0.20)
	```

	### Key Design Choices

	- DINOv2 backbone: Self-supervised features transfer exceptionally well to few-shot tasks, even on out-of-distribution semiconductor images
	- Prototypical Network: Non-parametric classifier that works with any number of support examples (K=1 to K=20+) without retraining
	- Cosine similarity + learned temperature: More stable than Euclidean distance for high-dimensional embeddings
	- Differential learning rates: Backbone fine-tuned at 5e-6, projection head at 3e-4 (60x ratio)
	- Gradient checkpointing: Reduces VRAM from ~24 GB to ~2 GB with minimal speed penalty

	## Training Details

	### Dataset

	Intel Semiconductor Solutions Challenge 2026 dataset:

	\| Class \| Name \| Samples \| Description \|
	\|-------\|------\|---------\|-------------\|
	\| 0 \| Good \| 7,135 \| Non-defective wafer surface \|
	\| 1 \| Defect 1 \| 253 \| Scratch-type defect \|
	\| 2 \| Defect 2 \| 178 \| Particle contamination \|
	\| 3 \| Defect 3 \| 9 \| Micro-crack (extremely rare) \|
	\| 4 \| Defect 4 \| 14 \| Edge defect (extremely rare) \|
	\| 5 \| Defect 5 \| 411 \| Pattern anomaly \|
	\| 8 \| Defect 8 \| 803 \| Surface roughness \|
	\| 9 \| Defect 9 \| 319 \| Deposition defect \|
	\| 10 \| Defect 10 \| 674 \| Etch residue \|

	Note: Classes 6 and 7 do not exist in the dataset. The extreme class imbalance (793:1 ratio between good and defect3) and visually similar class pairs (defect3/defect9 at 0.963 cosine similarity, defect4/defect8 at 0.889) make this a challenging benchmark.

	### Training Configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Training paradigm \| Episodic meta-learning \|
	\| Episodes per epoch \| 500 \|
	\| Episode structure \| 9-way 5-shot 10-query \|
	\| Optimizer \| AdamW \|
	\| Learning rate (head) \| 3.0e-4 \|
	\| Learning rate (backbone) \| 5.0e-6 \|
	\| LR schedule \| Cosine annealing with 5-epoch warmup \|
	\| Weight decay \| 1.0e-4 \|
	\| Label smoothing \| 0.1 \|
	\| Gradient clipping \| Max norm 1.0 \|
	\| Mixed precision \| AMP (float16) \|
	\| Batch processing \| Gradient checkpointing \|
	\| Early stopping \| Patience 20 epochs \|
	\| Input resolution \| 518x518 (DINOv2 native) \|
	\| Preprocessing \| LongestMaxSize + PadIfNeeded (aspect-ratio preserving) \|

	### Training Hardware

	- GPU: NVIDIA RTX PRO 6000 Blackwell Workstation Edition (95.6 GB VRAM)
	- Actual VRAM usage: ~2 GB (gradient checkpointing)
	- Training time: ~17 minutes/epoch
	- Convergence: 7 epochs (early stopping triggered at epoch 27)

	## Performance

	### K-Shot Classification Accuracy

	\| K (support images per class) \| Accuracy \|
	\|------------------------------\|----------\|
	\| K=1 \| 99.5% \|
	\| K=3 \| 99.7% \|
	\| K=5 \| 99.7% \|
	\| K=10 \| 99.7% \|
	\| K=20 \| 99.8% \|

	### Per-Class F1 Scores (K=20)

	\| Class \| F1 Score \|
	\|-------\|----------\|
	\| Defect 1 (Scratch) \| 1.000 \|
	\| Defect 2 (Particle) \| 1.000 \|
	\| Defect 3 (Micro-crack) \| 1.000 \|
	\| Defect 4 (Edge) \| 1.000 \|
	\| Defect 5 (Pattern) \| 0.994 \|
	\| Defect 8 (Roughness) \| 1.000 \|
	\| Defect 9 (Deposition) \| 1.000 \|
	\| Defect 10 (Etch residue) \| 0.996 \|

	Balanced accuracy (K=20): 0.999
	Macro F1 (K=20): 0.999

	### Good Image Detection

	The model includes a cosine similarity gap threshold for detecting non-defective ("good") wafer images:

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Good image accuracy \| ~90% \|
	\| Defect image accuracy \| ~97% \|
	\| Gap threshold \| 0.20 \|

	## How to Use

	### Quick Start

	```python
	import torch
	import yaml
	from PIL import Image
	from problem_a.src.backbone import get_backbone
	from problem_a.src.protonet import PrototypicalNetwork, IncrementalPrototypeTracker
	from problem_a.src.augmentations import get_eval_transform

	# Load model
	with open('problem_a/configs/default.yaml') as f:
	cfg = yaml.safe_load(f)

	backbone = get_backbone(cfg['model']['backbone'], cfg['model']['backbone_size'])
	model = PrototypicalNetwork(backbone, cfg['model']['proj_hidden'], cfg['model']['proj_dim'])

	checkpoint = torch.load('best_model.pt', map_location='cpu', weights_only=False)
	model.load_state_dict(checkpoint['model_state_dict'])
	model.eval().cuda()

	transform = get_eval_transform(cfg['data']['img_size'])

	# Create tracker and add support images
	tracker = IncrementalPrototypeTracker(model, torch.device('cuda'))

	# Add support images (at least 1 per class)
	for class_id, image_path in support_images:
	img = Image.open(image_path).convert('L')
	tensor = transform(img)
	tracker.add_example(tensor, class_id)

	# Classify a query image
	query_img = Image.open('query.png').convert('L')
	query_tensor = transform(query_img).unsqueeze(0).cuda()

	with torch.no_grad():
	log_probs = model.classify(query_tensor, tracker.prototypes)
	probs = torch.exp(log_probs).squeeze(0)

	# Get prediction
	label_map = tracker.label_map
	reverse_map = {v: k for k, v in label_map.items()}
	pred_idx = probs.argmax().item()
	predicted_class = reverse_map[pred_idx]
	confidence = probs[pred_idx].item()
	print(f'Predicted: class {predicted_class}, confidence: {confidence:.3f}')
	```

	### Download with huggingface_hub

	```python
	from huggingface_hub import hf_hub_download

	checkpoint_path = hf_hub_download(
	repo_id="Makatia/semiconductor-defect-classifier",
	filename="best_model.pt"
	)
	```

	## Model Specifications

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| DINOv2 ViT-L/14 + Prototypical Network \|
	\| Total parameters \| 306,142,209 \|
	\| Trainable parameters \| 77,366,273 (25.3%) \|
	\| Backbone \| DINOv2 ViT-L/14 (frozen + last 6 blocks) \|
	\| Embedding dimension \| 512 (L2-normalized) \|
	\| Projection head \| 1024 -> 768 -> 768 -> 512 \|
	\| Input size \| 518x518 (aspect-ratio preserved with padding) \|
	\| Input channels \| Grayscale (converted to 3-channel internally) \|
	\| Inference time \| ~700ms (GPU) / ~3s (CPU) \|
	\| VRAM (inference) \| ~2 GB \|
	\| Checkpoint size \| 1.17 GB \|
	\| Framework \| PyTorch 2.0+ \|
	\| Dependencies \| timm >= 1.0, albumentations >= 1.3 \|

	## Checkpoint Contents

	The `.pt` file contains:

	```python
	{
	'epoch': 7, # Best epoch
	'model_state_dict': {...}, # Full model weights
	'best_val_acc': 0.906, # Validation accuracy (episodic)
	'config': {...}, # Training configuration
	}
	```

	## Intended Use

	- Primary use: Semiconductor wafer defect detection and classification in manufacturing quality control
	- Few-shot scenarios: When only 1-20 labeled examples per defect class are available
	- Research: Few-shot learning, meta-learning, and industrial defect detection benchmarks

	## Limitations

	- Trained specifically on Intel challenge semiconductor images; may need fine-tuning for other semiconductor processes
	- Good image detection (~90% accuracy) is less reliable than defect classification (97-100%)
	- Requires grayscale input images; color images should be converted before inference
	- Extremely rare classes (defect3: 9 samples, defect4: 14 samples) have lower representation in training

	## Source Code

	Full training pipeline, evaluation scripts, and PySide6/QML desktop application available at:
	[github.com/fidel-makatia/Semiconductor_Defect_Classification_model](https://github.com/fidel-makatia/Semiconductor_Defect_Classification_model)

	## Citation

	```bibtex
	@misc{makatia2026semiconductor,
	title={Few-Shot Semiconductor Defect Classification with DINOv2 and Prototypical Networks},
	author={Fidel Makatia},
	year={2026},
	howpublished={Intel Semiconductor Solutions Challenge 2026},
	}
	```

	## License

	MIT License