nvidia
/

Cosmos-AnomalyGen-PCB-2B

+---
+license: other
+license_name: nvidia-open-model-license
+license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
+library_name: pytorch
+pipeline_tag: image-to-image
+base_model: nvidia/Cosmos-Predict2-2B-Text2Image
+tags:
+  - anomaly-detection
+  - synthetic-data
+  - pcb
+  - inpainting
+  - cosmos
+  - anomalygen
+---
+# Model Overview
+### Description:
+Cosmos AnomalyGen — PCB (UC1) generates synthetic printed-circuit-board anomaly images by inpainting a user-supplied binary mask onto a clean reference PCB image, conditioned on one of three trained `<texture>+<anomaly_type>` pairs (`IC+bridge`, `passive_component+excess_solder`, `passive_component+missing`). The release ships only the few-shot-finetuned modules — a set of anomaly-token embeddings and a 2-layer MLP adapter — which plug into the frozen Cosmos-Predict2 2B Text-to-Image diffusion backbone (also using a frozen NV-DINOv2 mask encoder and a frozen T5 text encoder) at inference time. Cosmos AnomalyGen — UC1 v1.0.0 was developed by NVIDIA as part of the Cosmos AnomalyGen pipeline. This model is ready for commercial use.<br>
+### License/Terms of Use:
+Use of the AnomalyGen finetuned modules in this release is governed by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).<br>
+Inference also requires the following components, which are **not** redistributed in this release and remain governed by their own terms:
+- Cosmos-Predict2-2B-Text2Image — [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
+- NV-DINOv2 classification model — distributed via NVIDIA NGC under the NVIDIA TAO license
+- google-t5/t5-large text encoder — Apache 2.0
+- The Anomaly Diffusion pipeline concept (adopted as the framework) — MIT License<br>
+### Deployment Geography:
+Global<br>
+### Use Case:
+Industrial visual-inspection teams responsible for PCB QA who have only a small number of real anomaly examples (≤62 per defect type). The model produces large-scale synthetic anomaly datasets (clean PCB + binary mask → realistic bridge / excess_solder / missing-component image) for training downstream defect-detection or segmentation models, including downstream TAO toolkit consumers via the DAFT v3.0 export path. Unlike UC2 and UC3, UC1 spans two PCB texture categories (`IC` and `passive_component`), so a single checkpoint can cover defects whose appearance depends on which board region (IC area vs. passive-component area) they occur in.<br>
+### Release Date:
+Github 06/02/2026 via https://github.com/NVIDIA/paidf-anomalygen<br>
+## References(s):
+- Anomaly Diffusion (AAAI 2024) — paper: https://arxiv.org/abs/2312.05767, code: https://github.com/sjtuplayer/anomalydiffusion
+- Cosmos-Predict2 — https://github.com/nvidia-cosmos/cosmos-predict2
+- NV-DINOv2 classification model — https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/nv_dinov2_classification_model<br>
+## Model Architecture:
+**Architecture Type:** Transformer (diffusion DiT backbone with learnable conditioning modules)<br>
+**Network Architecture:**
+- `anomaly_embedding` *(trainable, included in this release)*: token embeddings (256 tokens per `<texture>+<anomaly_type>` pair) — three pairs trained for UC1: `IC+bridge`, `passive_component+excess_solder`, `passive_component+missing`.
+- `adapter` *(trainable, included in this release)*: 2-layer MLP with GELU activations (input / output hidden size = 1024), projecting the mask encoder output into the diffusion DiT conditioning space.
+- `mask_encoder` *(frozen, not redistributed in this release)*: NV-DINOv2 (ViT-L) backbone with adaptive pool (kernel = 7); weights are loaded from the separately downloaded NV-DINOv2 classification checkpoint at inference time.
+- `text_encoder` *(frozen, not redistributed in this release)*: google-t5/t5-large.
+- These modules condition the **frozen** Cosmos-Predict2 2B T2I DiT denoiser at inference time.<br>
+**This model was developed based on Cosmos-Predict2-2B-Text2Image.**<br>
+**Number of model parameters:** Approximately a few million trainable parameters in the released modules (anomaly_embedding + MLP adapter), distributed as the `model/iter_000014000.pt` checkpoint file. The frozen Cosmos-Predict2 2B base contributes ~2.0×10^9 (2 billion) parameters used at inference time but **not** redistributed in this release.<br>
+## Input(s):
+**Input Type(s):** Image, Image (binary mask), Text<br>
+**Input Format(s):**
+- Image: PNG / JPG, RGB
+- Mask: PNG / JPG, single-channel binary (0 = background, 255 = anomaly region; binarized at threshold 127)
+- Text: anomaly-type string in the form `<texture>+<anomaly_type>` (one of `IC+bridge`, `passive_component+excess_solder`, `passive_component+missing`)<br>
+**Input Parameters:**
+- Image: Two-Dimensional (2D)
+- Mask: Two-Dimensional (2D)
+- Text: One-Dimensional (1D)<br>
+**Other Properties Related to Input:** Input clean image and paired mask must have the same dimensions; the model was trained at 512×512 and inference is run at the same resolution. `anomaly_type` must exactly match one of the three pairs trained for this UC1 checkpoint — passing an unsupported defect string is rejected by `scripts/anomaly_gen/sdg-inference/validate_jsonl.py` against this checkpoint's `ag_config.yaml → dataloader_train.dataset.anomaly_types`. Because UC1 spans two textures, the chosen texture (`IC` vs. `passive_component`) must match the board region from which the clean reference image was cropped, otherwise the generated defect may look misplaced. The optional Automatic Mask Placement (AMP) tool can constrain mask placement to legal ROIs (e.g., only on IC pads, only on passive-component pads).<br>
+## Output(s)
+**Output Type(s):** Image<br>
+**Output Format(s):** PNG, RGB<br>
+**Output Parameters:** Two-Dimensional (2D)<br>
+**Other Properties Related to Output:** 512×512 RGB synthetic anomaly image. Anomaly content is generated inside the user-supplied mask region; in the default `crop_and_paste=True` flow the inpainted patch is pasted back onto the clean reference image so non-masked pixels remain identical to the input. Optionally Poisson blending can be enabled. Generation metadata (per-sample guidance, crop_ratio, seed, etc.) is written to `SDG_result.csv` alongside the images.<br>
+Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.<br>
+## Software Integration:
+**Runtime Engine(s):**
+- PyTorch (via the Cosmos-Predict2 2B T2I pipeline)
+- Cosmos AnomalyGen scripts (`scripts.anomaly_gen.synthetic_dataset_generation`, torchrun-based)
+- NVIDIA TAO Toolkit — interop via DAFT v3.0 export (`scripts.anomaly_gen.convert_to_daft_format`)<br>
+**Supported Hardware Microarchitecture Compatibility:**
+- NVIDIA Ampere (A100)
+- NVIDIA Hopper (H100)
+- NVIDIA RTX 6000<br>
+**Supported Operating System(s):**
+- Linux<br>
+The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.<br>
+## Model Version(s):
+v1.0.0 — `uc1-pcb-2b-512-iter14000` (trained 14,000 iterations; released artifact is the `model/iter_000014000.pt` checkpoint file containing finetuned modules only).<br>
+## Training, Testing, and Evaluation Datasets:
+### Dataset Overview
+- Total Number of Datasets: 1 (Cosmos AnomalyGen PCB reference dataset, pre-packaged on NVIDIA NGC — no local preparation required)
+- Total Size: 86 anomaly RGB images + 86 paired binary masks + clean PCB reference images (clean images are used at inference time, not as supervised training targets)
+- Per `<texture>+<anomaly_type>` pair: `IC+bridge` = 8 anomaly images, `passive_component+excess_solder` = 16 anomaly images, `passive_component+missing` = 62 anomaly images
+- Dataset partition: 100% Training (few-shot fine-tuning regime; no held-out validation or test split)<br>
+## Public Datasets
+- `nv-metropolis-dev/metropolis-sdg/cosmos-anomalygen-pcb-dataset:1.0` — NVIDIA NGC — anomaly images, paired binary masks, clean reference images, and `defect_spec.jsonl`.
+- `nv-metropolis-dev/metropolis-sdg/cosmos-anomalygen-pcb-assets:1.0` — NVIDIA NGC — supporting assets for the UC1 pipeline.<br>
+## Training Dataset:
+### Data Modality:
+- Image<br>
+### Training Data Size:
+- Less than a Million Images (86 anomaly images + 86 paired masks across three `<texture>+<anomaly_type>` pairs)
+** Data Collection Method by dataset
+- Manually-Collected (camera-captured PCB images)
+** Labeling Method by dataset
+- Manually-Labeled (per-defect binary masks produced for the NGC dataset)
+**Properties:** 86 RGB images of printed circuit boards spanning two textures (`IC`, `passive_component`) and three defect categories — `IC+bridge` (8 images), `passive_component+excess_solder` (16 images), `passive_component+missing` (62 images) — each paired with a binary mask indicating defect pixels. Clean (no-defect) PCB images from the same NGC dataset are used as inpainting references at inference time. The class imbalance across the three pairs (e.g., 62× `missing` vs. 8× `bridge`) is intentional and reflects what was available; the diffusion fine-tuning is robust to per-class sample counts as low as a handful. No personal data, copyrighted human-subject content, or human subjects are present in the dataset.<br>
+### Testing Dataset:
+Not Applicable — the model is qualitatively evaluated via the `log_image` validation callback (inpainted samples logged every N training steps) rather than against a held-out test split.<br>
+### Evaluation Dataset:
+Not Applicable for this release.<br>
+## Inference:
+**Acceleration Engine:** PyTorch (native, FP32). Multi-GPU rank-sharded inference is supported via `torchrun --nproc_per_node=<N>` with the `predict2_anomaly_gen_fsdp_2b` experiment.<br>
+**Test Hardware:**
+- NVIDIA A100
+- NVIDIA H100
+- NVIDIA RTX 6000<br>
+## Ethical Considerations:
+NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.<br>
+Please make sure you have proper rights and permissions for all input image content used as clean references or anomaly examples. PCB images are inanimate objects, but users should still verify that any incidentally captured personally identifiable content (e.g., visible serial numbers, hand-written part identifiers) is handled in accordance with applicable privacy laws prior to use.<br>
+Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment.<br>
+For more detailed information on ethical considerations for this model, please see the Bias, Explainability, Privacy, and Safety & Security subcards alongside this overview in `modelcard/UC1/`.<br>
+Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).<br>