--- title: PCB Defect Detection (RT-DETRv4) colorFrom: blue colorTo: green sdk: streamlit sdk_version: 1.44.0 app_file: app.py pinned: false license: apache-2.0 --- # PCB Defect Detection — RT-DETRv4 X on DsPCBSD+ RT-DETRv4 X fine-tuned on the DsPCBSD+ dataset for 9-class copper-layer defect detection. ## Live Demo Try it in your browser without any setup: **[https://huggingface.co/spaces/mcthebest/PCB_RTDETR](https://huggingface.co/spaces/mcthebest/PCB_RTDETR)** Upload your own PCB image or pick from built-in test images. The confidence threshold is adjustable in the sidebar. ## Dataset: DsPCBSD+ DsPCBSD+ is a 2024 open dataset for PCB copper-layer defect detection, captured by a professional AOI system (AGLE'OL AOI-100 V8, 16K camera, controlled LED lighting). Nine defect categories, annotated at instance level: | Code | Defect | Description | |---|---|---| | SH | Short | Conductive bridge between two traces | | SP | Spur | Anomalous copper spike from a trace | | SC | Spurious Copper | Unwanted copper on board surface | | OP | Open Circuit | Break in a conductive trace | | MB | Mouse Bite | Edge notch or chip in the trace | | HB | Hole Breakout | Damage to material around a drill hole | | CS | Conductor Scratch | Scratch mark on a conductive trace | | CFO | Conductor Foreign Object | Foreign particle on a trace | | BMFO | Base Material Foreign Object | Foreign particle in substrate material | > S. Lv et al., "A dataset for deep learning based detection of printed circuit board surface defect," *Scientific Data*, vol. 11, no. 1, p. 811, 2024. https://doi.org/10.1038/s41597-024-03656-8 ## Model ### Architecture: RT-DETRv4 X RT-DETRv4 builds on the RT-DETR lineage (v1, v2, v3) with ideas from D-FINE and DEIM. The main addition is a semantic distillation framework that uses a frozen DINOv3 teacher during training only, adding no cost at inference. | Component | Detail | |---|---| | Backbone | HGNetv2-X, multi-scale features at stride 8/16/32 | | Encoder | Efficient Hybrid Encoder: AIFI (global self-attention on S5) + CCFF (CNN cross-scale fusion) | | Decoder | DFINETransformer, 6 layers, 300 queries, 32-bin probabilistic box regression per edge | | Teacher (train only) | DINOv3 ViT-B/16 (frozen), trained on LVD-1689M (~1.7B images) | | DSI module | Aligns F5 features with DINOv3 semantics via cosine similarity loss | | GAM module | Adjusts DSI loss weight per epoch based on gradient norms | Training loss: ``` L_total = L_det + λ · L_DSI ``` `L_det` includes VFL, L1, GIoU, FGL, DDF, and MAL losses from D-FINE and DEIM. DINOv3 is not loaded at inference. ### Training Configuration | Parameter | Value | |---|---| | Dataset | DsPCBSD+ (8,208 train / 2,051 val, 80-20 split) | | Pretrained weights | RT-DETRv4-X COCO + DINOv3 ViT-B/16 (LVD-142M) | | Input resolution | 640 x 640 px | | Epochs | 10 (72 recommended; limited by Colab T4) | | Batch size | 8 | | Optimizer | AdamW (lr encoder/decoder = 2e-4, lr backbone = 1e-5, wd = 1e-4) | | LR Scheduler | FlatCosine (warmup 500 iter, flat 5 epoch, no-aug 2 epoch) | | Mixed precision | AMP FP16/FP32 | | Hardware | NVIDIA Tesla T4 (16 GB VRAM), Google Colab | ## Results on DsPCBSD+ Validation Set ### Qualitative Examples Ground truth vs. predictions at conf≥0.3. The model correctly localises all four defects in the first example (BMFO×2, MB, CFO) and all three Mouse Bite instances in the second, with one low-confidence extra detection. ![Qualitative example 1 — GT vs prediction, 4 defects](images/1.png) ![Qualitative example 2 — GT vs prediction, Mouse Bite detections](images/2.png) ### Overall COCO Metrics | Metric | Value | |---|---| | mAP @ IoU=0.50 | 0.863 | | mAP @ IoU=0.50:0.95 | 0.522 | | mAP @ IoU=0.75 | 0.551 | | AP small (<32² px) | 0.439 | | AP medium | 0.602 | | AP large (>96² px) | 0.754 | | AR @ maxDets=100 | 0.686 | ### Per-Class AP @ IoU=0.50:0.95 | Class | Full Name | AP | |---|---|---| | SH | Short Circuit | 0.597 | | SP | Spur (Copper Spike) | 0.381 | | SC | Spurious Copper | 0.511 | | OP | Open Circuit | 0.555 | | MB | Mouse Bite | 0.408 | | HB | Hole Breakout | 0.807 | | CS | Conductor Scratch | 0.504 | | CFO | Conductor Foreign Object | 0.462 | | BMFO | Base Material Foreign Object | 0.478 | | Mean | | 0.522 | ![COCO detection metrics — AP/AR across IoU thresholds and object sizes, plus per-category AP](images/4_coco_metrics.png) ### Per-Class F1 @ Conf≥0.5, IoU≥0.5 | Class | Precision | Recall | F1 | |---|---|---|---| | SH | 0.91 | 0.86 | 0.88 | | SP | 0.77 | 0.69 | 0.73 | | SC | 0.89 | 0.77 | 0.83 | | OP | 0.85 | 0.84 | 0.84 | | MB | 0.87 | 0.69 | 0.77 | | HB | 0.91 | 0.93 | 0.92 | | CS | 0.82 | 0.65 | 0.73 | | CFO | 0.86 | 0.56 | 0.68 | | BMFO | 0.87 | 0.83 | 0.85 | | Mean | | | 0.802 | ![Per-class Precision / Recall / F1 at Conf≥0.5, IoU≥0.5](images/3b_precision_recall_f1.png) ### Confusion Matrix @ Conf≥0.5, IoU≥0.5 Raw counts (left) and row-normalised rates (right). HB is the strongest class (0.93 recall); CFO and CS have the most false negatives, largely absorbed into Background. ![Confusion matrix — counts and row-normalised](images/3a_confusion_matrix.png) ### Comparison with SOTA on DsPCBSD+ | Model | Backbone | mAP@50 | mAP@50:95 | Notes | |---|---|---|---|---| | YOLOv11-CGL | YOLOv11n | 84.5% | 51.6% | 300 epochs, lightweight | | PCB-AM | YOLOv8s | 85.7% | N/A | Attention-guided modules | | PCB-FS | YOLOv8 | 86.2% | 52.4% | Frequency-spatial features | | **RT-DETRv4 X (ours)** | **HGNetv2-X + DINOv3** | **86.3%** | **52.25%** | **10 epochs, T4 only** | RT-DETRv4 X matches or exceeds all compared methods at only 10 epochs on a single T4. PCB-FS, for comparison, used 100+ epochs with more resources. ## Running Locally Clone the Space and install dependencies: ```bash git clone https://huggingface.co/spaces/mcthebest/PCB_RTDETR cd PCB_RTDETR pip install -r requirements.txt ``` Download the checkpoint: ```python from huggingface_hub import hf_hub_download ckpt_path = hf_hub_download( repo_id="mcthebest/PCB_RTDETR", repo_type="model", filename="last.pth", ) ``` Or place it manually at `outputs/rtv4_hgnetv2_x_pcb/last.pth`, then run: ```bash streamlit run app.py ``` ## How Inference Works The image is resized to 640×640 and passed through RT-DETRv4 X. The decoder outputs 300 candidate queries which the postprocessor filters by confidence threshold (default 0.30). Detections are returned as `(labels, boxes, scores)` and drawn with labeled bounding boxes. DINOv3 is not loaded at inference. ## Known Limitations - Small defects (AP=0.439): sub-32px defects like SP (Spur) are the hardest class. - Only 10 epochs were run vs. the recommended 72. The learning curve had not plateaued, so more training should meaningfully improve mAP@50:95. - The dataset was collected under controlled AOI conditions. Images from different lighting or camera setups may need domain adaptation. - AR@100=0.686, meaning roughly 31% of defects are missed. Not production-ready for zero-miss QC pipelines. ## References - [RT-DETRv4](https://arxiv.org/abs/2510.25257) — Liao et al., 2025 - [D-FINE](https://arxiv.org/abs/2410.13842) — Peng et al., ICLR 2025 - [DEIM](https://arxiv.org/abs/2412.04234) — Huang et al., CVPR 2025 - [DINOv3](https://arxiv.org/abs/2508.10104) — Siméoni et al., 2025 - [RT-DETRv4 GitHub](https://github.com/RT-DETRs/RT-DETRv4) - [DsPCBSD+ dataset](https://doi.org/10.6084/m9.figshare.24970329)