Image-to-Image
Cosmos
PyTorch
nvidia
cosmos-predict2
diffusion
inpainting
anomaly-generation
synthetic-data-generation
pcb-inspection
few-shot
fine-tuned
Instructions to use nvidia/Cosmos-AnomalyGen-PCB-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Cosmos
How to use nvidia/Cosmos-AnomalyGen-PCB-2B with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Add model card
Browse files
README.md
CHANGED
|
@@ -1,17 +1,18 @@
|
|
| 1 |
---
|
| 2 |
license: other
|
| 3 |
-
license_name: nvidia-open-model-license
|
| 4 |
-
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
|
| 5 |
-
library_name: pytorch
|
| 6 |
-
pipeline_tag: image-to-image
|
| 7 |
-
base_model: nvidia/Cosmos-Predict2-2B-Text2Image
|
| 8 |
tags:
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
---
|
| 16 |
|
| 17 |
# Model Overview
|
|
@@ -55,14 +56,14 @@ Github 06/02/2026 via https://github.com/NVIDIA/paidf-anomalygen<br>
|
|
| 55 |
|
| 56 |
**This model was developed based on Cosmos-Predict2-2B-Text2Image.**<br>
|
| 57 |
|
| 58 |
-
**Number of model parameters:** Approximately
|
| 59 |
|
| 60 |
## Input(s):
|
| 61 |
-
**Input Type(s):** Image,
|
| 62 |
|
| 63 |
**Input Format(s):**
|
| 64 |
-
- Image: PNG / JPG, RGB
|
| 65 |
-
- Mask: PNG / JPG, single-channel binary (0 = background, 255 = anomaly region; binarized at threshold 127)
|
| 66 |
- Text: anomaly-type string in the form `<texture>+<anomaly_type>` (one of `IC+bridge`, `passive_component+excess_solder`, `passive_component+missing`)<br>
|
| 67 |
|
| 68 |
**Input Parameters:**
|
|
@@ -76,7 +77,7 @@ Github 06/02/2026 via https://github.com/NVIDIA/paidf-anomalygen<br>
|
|
| 76 |
|
| 77 |
**Output Type(s):** Image<br>
|
| 78 |
|
| 79 |
-
**Output Format(s):** PNG, RGB<br>
|
| 80 |
|
| 81 |
**Output Parameters:** Two-Dimensional (2D)<br>
|
| 82 |
|
|
@@ -124,18 +125,18 @@ v1.0.0 β `uc1-pcb-2b-512-iter14000` (trained 14,000 iterations; released artif
|
|
| 124 |
- Less than a Million Images (86 anomaly images + 86 paired masks across three `<texture>+<anomaly_type>` pairs)
|
| 125 |
|
| 126 |
** Data Collection Method by dataset
|
| 127 |
-
-
|
| 128 |
|
| 129 |
** Labeling Method by dataset
|
| 130 |
-
-
|
| 131 |
|
| 132 |
-
**Properties:** 86 RGB images of printed circuit boards spanning two textures (`IC`, `passive_component`) and three defect categories β `IC+bridge` (8 images), `passive_component+excess_solder` (16 images), `passive_component+missing` (62 images) β each paired with a binary mask
|
| 133 |
|
| 134 |
### Testing Dataset:
|
| 135 |
Not Applicable β the model is qualitatively evaluated via the `log_image` validation callback (inpainted samples logged every N training steps) rather than against a held-out test split.<br>
|
| 136 |
|
| 137 |
### Evaluation Dataset:
|
| 138 |
-
Not Applicable for
|
| 139 |
|
| 140 |
## Inference:
|
| 141 |
**Acceleration Engine:** PyTorch (native, FP32). Multi-GPU rank-sharded inference is supported via `torchrun --nproc_per_node=<N>` with the `predict2_anomaly_gen_fsdp_2b` experiment.<br>
|
|
@@ -148,10 +149,61 @@ Not Applicable for this release.<br>
|
|
| 148 |
## Ethical Considerations:
|
| 149 |
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.<br>
|
| 150 |
|
| 151 |
-
Please make sure you have proper rights and permissions for all input image
|
| 152 |
|
| 153 |
Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment.<br>
|
| 154 |
|
| 155 |
-
For more detailed information on ethical considerations for this model, please see the Bias, Explainability, Privacy, and Safety & Security subcards
|
| 156 |
|
| 157 |
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).<br>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: other
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
+
- nvidia
|
| 5 |
+
- cosmos
|
| 6 |
+
- cosmos-predict2
|
| 7 |
+
- image-to-image
|
| 8 |
+
- diffusion
|
| 9 |
+
- inpainting
|
| 10 |
+
- anomaly-generation
|
| 11 |
+
- synthetic-data-generation
|
| 12 |
+
- pcb-inspection
|
| 13 |
+
- few-shot
|
| 14 |
+
- fine-tuned
|
| 15 |
+
- pytorch
|
| 16 |
---
|
| 17 |
|
| 18 |
# Model Overview
|
|
|
|
| 56 |
|
| 57 |
**This model was developed based on Cosmos-Predict2-2B-Text2Image.**<br>
|
| 58 |
|
| 59 |
+
**Number of model parameters:** Approximately 2.9Γ10^6 (2.9 million) trainable parameters in the released modules β `anomaly_embedding` β 0.79M (256 tokens Γ 1024 hidden Γ 3 `<texture>+<anomaly_type>` pairs) plus the 2-layer MLP `adapter` β 2.1M (1024β1024 with GELU). The trainable modules are distributed as the `model/iter_000014000.pt` checkpoint file. The frozen Cosmos-Predict2 2B base contributes ~2.0Γ10^9 (2 billion) parameters used at inference time but **not** redistributed in this release.<br>
|
| 60 |
|
| 61 |
## Input(s):
|
| 62 |
+
**Input Type(s):** Image, Binary Mask, Text<br>
|
| 63 |
|
| 64 |
**Input Format(s):**
|
| 65 |
+
- Image: PNG / JPG, Red, Green, Blue (RGB)
|
| 66 |
+
- Binary Mask: PNG / JPG, single-channel binary (0 = background, 255 = anomaly region; binarized at threshold 127)
|
| 67 |
- Text: anomaly-type string in the form `<texture>+<anomaly_type>` (one of `IC+bridge`, `passive_component+excess_solder`, `passive_component+missing`)<br>
|
| 68 |
|
| 69 |
**Input Parameters:**
|
|
|
|
| 77 |
|
| 78 |
**Output Type(s):** Image<br>
|
| 79 |
|
| 80 |
+
**Output Format(s):** PNG; Red, Green, Blue (RGB)<br>
|
| 81 |
|
| 82 |
**Output Parameters:** Two-Dimensional (2D)<br>
|
| 83 |
|
|
|
|
| 125 |
- Less than a Million Images (86 anomaly images + 86 paired masks across three `<texture>+<anomaly_type>` pairs)
|
| 126 |
|
| 127 |
** Data Collection Method by dataset
|
| 128 |
+
- Hybrid: Human, Automatic/Sensors
|
| 129 |
|
| 130 |
** Labeling Method by dataset
|
| 131 |
+
- Human
|
| 132 |
|
| 133 |
+
**Properties:** 86 RGB images of printed circuit boards spanning two textures (`IC`, `passive_component`) and three defect categories β `IC+bridge` (8 images), `passive_component+excess_solder` (16 images), `passive_component+missing` (62 images) β each paired with a per-defect binary mask. Clean (no-defect) PCB images from the same NGC dataset are used as inpainting references at inference time. The class imbalance across the three pairs (e.g., 62Γ `missing` vs. 8Γ `bridge`) is intentional and reflects what was available; the diffusion fine-tuning is robust to per-class sample counts as low as a handful. No personal data, copyrighted human-subject content, or human subjects are present in the dataset.<br>
|
| 134 |
|
| 135 |
### Testing Dataset:
|
| 136 |
Not Applicable β the model is qualitatively evaluated via the `log_image` validation callback (inpainted samples logged every N training steps) rather than against a held-out test split.<br>
|
| 137 |
|
| 138 |
### Evaluation Dataset:
|
| 139 |
+
Not Applicable β no held-out evaluation dataset. The model is evaluated qualitatively during training via the `log_image` validation callback, plus FID and nearest-neighbour metrics (`nn_score`, `mnn_score`) logged at each validation step. See the [Explainability](./explainability.md) subcard for performance metric details.<br>
|
| 140 |
|
| 141 |
## Inference:
|
| 142 |
**Acceleration Engine:** PyTorch (native, FP32). Multi-GPU rank-sharded inference is supported via `torchrun --nproc_per_node=<N>` with the `predict2_anomaly_gen_fsdp_2b` experiment.<br>
|
|
|
|
| 149 |
## Ethical Considerations:
|
| 150 |
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.<br>
|
| 151 |
|
| 152 |
+
Please make sure you have proper rights and permissions for all input image and video content.<br>
|
| 153 |
|
| 154 |
Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment.<br>
|
| 155 |
|
| 156 |
+
For more detailed information on ethical considerations for this model, please see the [Bias](./bias.md), [Explainability](./explainability.md), [Privacy](./privacy.md), and [Safety & Security](./safety.md) subcards.<br>
|
| 157 |
|
| 158 |
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).<br>
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
## Bias
|
| 162 |
+
|
| 163 |
+
Field | Response
|
| 164 |
+
:---------------------------------------------------------------------------------------------------|:---------------
|
| 165 |
+
Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | Not Applicable β the model operates on images of printed circuit boards (inanimate objects) and does not target or classify human subjects.
|
| 166 |
+
Measures taken to mitigate against unwanted bias: | Not Applicable β the model is a synthetic anomaly image generator for industrial defect inspection and does not produce classifications or predictions on protected attributes.
|
| 167 |
+
Bias Metric (If Measured): | Not Measured.
|
| 168 |
+
|
| 169 |
+
|
| 170 |
+
## Explainability
|
| 171 |
+
|
| 172 |
+
Field | Response
|
| 173 |
+
:------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------
|
| 174 |
+
Intended Task/Domain: | Industrial visual inspection β synthetic anomaly image generation (image-to-image inpainting) for printed-circuit-board defect detection.
|
| 175 |
+
Model Type: | Diffusion model β Transformer-based DiT denoiser (Cosmos-Predict2 2B, frozen) conditioned by learned anomaly-token embeddings, a frozen NV-DINOv2 mask encoder, and a trained 2-layer MLP adapter that projects mask-encoder features into the DiT conditioning space.
|
| 176 |
+
Intended Users: | ML engineers and computer-vision practitioners building defect-detection or segmentation models for printed-circuit-board QA.
|
| 177 |
+
Output: | Image (single 512Γ512 RGB anomaly image per inference call; multiple per testcase line when `num_generated_images > 1`).
|
| 178 |
+
Describe how the model works: | A user-supplied binary mask indicating where the defect should appear is encoded by a frozen NV-DINOv2 backbone and projected through a trained 2-layer MLP adapter into the diffusion DiT conditioning space. The selected `<texture>+<anomaly_type>` token (`IC+bridge`, `passive_component+excess_solder`, or `passive_component+missing`) retrieves a learned 256-token embedding which further conditions the frozen Cosmos-Predict2 2B T2I DiT denoiser. The denoiser then produces an inpainting-style sample where pixels inside the mask are generated as a defect that matches the trained type, while pixels outside the mask are preserved from the clean reference image (with optional crop-and-paste / Poisson blending). The two-texture design (`IC` vs. `passive_component`) lets a single checkpoint cover defects that appear differently depending on board region.
|
| 179 |
+
Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable
|
| 180 |
+
Technical Limitations & Mitigation: | Trained on a small number of examples per defect pair (8, 16, and 62 anomaly images for `IC+bridge`, `passive_component+excess_solder`, and `passive_component+missing` respectively). Out-of-distribution board layouts, mask placements that deviate substantially from the trained distribution, mismatched texture choice (e.g., using `IC+bridge` on a passive-component region of the board), or any defect not in the three trained pairs may yield low-fidelity outputs. The `IC+bridge` pair has the fewest training examples and is the most sensitive to mask-placement quality. Mitigation: (a) use the Automatic Mask Placement (AMP) tool, which automatically restricts the user-supplied mask to legal regions of interest for each `<texture>+<anomaly_type>` pair (e.g., only on IC pads for `IC+bridge`, only on passive-component pads for `passive_component+*`), preventing the model from generating defects in implausible board locations; (b) run `scripts/anomaly_gen/filter.py` on the generated dataset β the filter scores each output with a Generative Image Quality Assessment (G-IQA) model and discards samples whose quality/realism score falls below a configurable threshold, so unrealistic synthetic images do not pollute the downstream training set; (c) validate any downstream detector trained on synthetic samples against real defect imagery before deployment.
|
| 181 |
+
Verified to have met prescribed NVIDIA quality standards: | Yes.
|
| 182 |
+
Performance Metrics: | FID (logged during training validation), nearest-neighbor metrics (`nn_score`, `mnn_score`), and visual inspection of `log_image` callback outputs.
|
| 183 |
+
Potential Known Risks: | Generated synthetic defects may not perfectly cover all real-world defect variations. Downstream detection models trained primarily on these synthetic samples should be validated against real defect data before deployment in production QA lines.<br><br>This model can generate synthetic images and may produce content that is offensive, unsafe, misleading, indecent, or unsuitable for a target deployment. Users should implement robust safety guardrails β including content filtering, abuse monitoring, and access controls β to reduce the risk of harmful outputs. Users are responsible for ensuring that their use of the model complies with all applicable laws and regulations, and for regularly reviewing and updating their guardrails as risks evolve.
|
| 184 |
+
Licensing: | [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
|
| 185 |
+
|
| 186 |
+
|
| 187 |
+
## Privacy
|
| 188 |
+
|
| 189 |
+
Field | Response
|
| 190 |
+
:----------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------
|
| 191 |
+
Generatable or reverse engineerable personal data? | No
|
| 192 |
+
Personal data used to create this model? | No
|
| 193 |
+
Was consent obtained for any personal data used? | Not Applicable
|
| 194 |
+
How often is dataset reviewed? | Before Release
|
| 195 |
+
Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? | No
|
| 196 |
+
Is there provenance for all datasets used in training? | Yes
|
| 197 |
+
Does data labeling (annotation, metadata) comply with privacy laws? | Yes
|
| 198 |
+
Is data compliant with data subject requests for data correction or removal, if such a request was made? | Not Applicable
|
| 199 |
+
Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/
|
| 200 |
+
|
| 201 |
+
|
| 202 |
+
## Safety
|
| 203 |
+
|
| 204 |
+
Field | Response
|
| 205 |
+
:---------------------------------------------------|:----------------------------------
|
| 206 |
+
Model Application Field(s): | Industrial/Machinery and Robotics β specifically synthetic data generation for manufacturing QA / visual inspection of printed circuit boards.
|
| 207 |
+
Describe the life critical impact (if present). | Not Applicable.
|
| 208 |
+
Use Case Restrictions: | Abide by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
|
| 209 |
+
Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
|