jianhey commited on
Commit
56ad2ce
Β·
verified Β·
1 Parent(s): 21bd12f

Add model card

Browse files
Files changed (1) hide show
  1. README.md +74 -22
README.md CHANGED
@@ -1,17 +1,18 @@
1
  ---
2
  license: other
3
- license_name: nvidia-open-model-license
4
- license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
5
- library_name: pytorch
6
- pipeline_tag: image-to-image
7
- base_model: nvidia/Cosmos-Predict2-2B-Text2Image
8
  tags:
9
- - anomaly-detection
10
- - synthetic-data
11
- - pcb
12
- - inpainting
13
- - cosmos
14
- - anomalygen
 
 
 
 
 
 
15
  ---
16
 
17
  # Model Overview
@@ -55,14 +56,14 @@ Github 06/02/2026 via https://github.com/NVIDIA/paidf-anomalygen<br>
55
 
56
  **This model was developed based on Cosmos-Predict2-2B-Text2Image.**<br>
57
 
58
- **Number of model parameters:** Approximately a few million trainable parameters in the released modules (anomaly_embedding + MLP adapter), distributed as the `model/iter_000014000.pt` checkpoint file. The frozen Cosmos-Predict2 2B base contributes ~2.0Γ—10^9 (2 billion) parameters used at inference time but **not** redistributed in this release.<br>
59
 
60
  ## Input(s):
61
- **Input Type(s):** Image, Image (binary mask), Text<br>
62
 
63
  **Input Format(s):**
64
- - Image: PNG / JPG, RGB
65
- - Mask: PNG / JPG, single-channel binary (0 = background, 255 = anomaly region; binarized at threshold 127)
66
  - Text: anomaly-type string in the form `<texture>+<anomaly_type>` (one of `IC+bridge`, `passive_component+excess_solder`, `passive_component+missing`)<br>
67
 
68
  **Input Parameters:**
@@ -76,7 +77,7 @@ Github 06/02/2026 via https://github.com/NVIDIA/paidf-anomalygen<br>
76
 
77
  **Output Type(s):** Image<br>
78
 
79
- **Output Format(s):** PNG, RGB<br>
80
 
81
  **Output Parameters:** Two-Dimensional (2D)<br>
82
 
@@ -124,18 +125,18 @@ v1.0.0 β€” `uc1-pcb-2b-512-iter14000` (trained 14,000 iterations; released artif
124
  - Less than a Million Images (86 anomaly images + 86 paired masks across three `<texture>+<anomaly_type>` pairs)
125
 
126
  ** Data Collection Method by dataset
127
- - Manually-Collected (camera-captured PCB images)
128
 
129
  ** Labeling Method by dataset
130
- - Manually-Labeled (per-defect binary masks produced for the NGC dataset)
131
 
132
- **Properties:** 86 RGB images of printed circuit boards spanning two textures (`IC`, `passive_component`) and three defect categories β€” `IC+bridge` (8 images), `passive_component+excess_solder` (16 images), `passive_component+missing` (62 images) β€” each paired with a binary mask indicating defect pixels. Clean (no-defect) PCB images from the same NGC dataset are used as inpainting references at inference time. The class imbalance across the three pairs (e.g., 62Γ— `missing` vs. 8Γ— `bridge`) is intentional and reflects what was available; the diffusion fine-tuning is robust to per-class sample counts as low as a handful. No personal data, copyrighted human-subject content, or human subjects are present in the dataset.<br>
133
 
134
  ### Testing Dataset:
135
  Not Applicable β€” the model is qualitatively evaluated via the `log_image` validation callback (inpainted samples logged every N training steps) rather than against a held-out test split.<br>
136
 
137
  ### Evaluation Dataset:
138
- Not Applicable for this release.<br>
139
 
140
  ## Inference:
141
  **Acceleration Engine:** PyTorch (native, FP32). Multi-GPU rank-sharded inference is supported via `torchrun --nproc_per_node=<N>` with the `predict2_anomaly_gen_fsdp_2b` experiment.<br>
@@ -148,10 +149,61 @@ Not Applicable for this release.<br>
148
  ## Ethical Considerations:
149
  NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.<br>
150
 
151
- Please make sure you have proper rights and permissions for all input image content used as clean references or anomaly examples. PCB images are inanimate objects, but users should still verify that any incidentally captured personally identifiable content (e.g., visible serial numbers, hand-written part identifiers) is handled in accordance with applicable privacy laws prior to use.<br>
152
 
153
  Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment.<br>
154
 
155
- For more detailed information on ethical considerations for this model, please see the Bias, Explainability, Privacy, and Safety & Security subcards alongside this overview in `modelcard/UC1/`.<br>
156
 
157
  Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).<br>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
 
 
 
 
 
3
  tags:
4
+ - nvidia
5
+ - cosmos
6
+ - cosmos-predict2
7
+ - image-to-image
8
+ - diffusion
9
+ - inpainting
10
+ - anomaly-generation
11
+ - synthetic-data-generation
12
+ - pcb-inspection
13
+ - few-shot
14
+ - fine-tuned
15
+ - pytorch
16
  ---
17
 
18
  # Model Overview
 
56
 
57
  **This model was developed based on Cosmos-Predict2-2B-Text2Image.**<br>
58
 
59
+ **Number of model parameters:** Approximately 2.9Γ—10^6 (2.9 million) trainable parameters in the released modules β€” `anomaly_embedding` β‰ˆ 0.79M (256 tokens Γ— 1024 hidden Γ— 3 `<texture>+<anomaly_type>` pairs) plus the 2-layer MLP `adapter` β‰ˆ 2.1M (1024β†’1024 with GELU). The trainable modules are distributed as the `model/iter_000014000.pt` checkpoint file. The frozen Cosmos-Predict2 2B base contributes ~2.0Γ—10^9 (2 billion) parameters used at inference time but **not** redistributed in this release.<br>
60
 
61
  ## Input(s):
62
+ **Input Type(s):** Image, Binary Mask, Text<br>
63
 
64
  **Input Format(s):**
65
+ - Image: PNG / JPG, Red, Green, Blue (RGB)
66
+ - Binary Mask: PNG / JPG, single-channel binary (0 = background, 255 = anomaly region; binarized at threshold 127)
67
  - Text: anomaly-type string in the form `<texture>+<anomaly_type>` (one of `IC+bridge`, `passive_component+excess_solder`, `passive_component+missing`)<br>
68
 
69
  **Input Parameters:**
 
77
 
78
  **Output Type(s):** Image<br>
79
 
80
+ **Output Format(s):** PNG; Red, Green, Blue (RGB)<br>
81
 
82
  **Output Parameters:** Two-Dimensional (2D)<br>
83
 
 
125
  - Less than a Million Images (86 anomaly images + 86 paired masks across three `<texture>+<anomaly_type>` pairs)
126
 
127
  ** Data Collection Method by dataset
128
+ - Hybrid: Human, Automatic/Sensors
129
 
130
  ** Labeling Method by dataset
131
+ - Human
132
 
133
+ **Properties:** 86 RGB images of printed circuit boards spanning two textures (`IC`, `passive_component`) and three defect categories β€” `IC+bridge` (8 images), `passive_component+excess_solder` (16 images), `passive_component+missing` (62 images) β€” each paired with a per-defect binary mask. Clean (no-defect) PCB images from the same NGC dataset are used as inpainting references at inference time. The class imbalance across the three pairs (e.g., 62Γ— `missing` vs. 8Γ— `bridge`) is intentional and reflects what was available; the diffusion fine-tuning is robust to per-class sample counts as low as a handful. No personal data, copyrighted human-subject content, or human subjects are present in the dataset.<br>
134
 
135
  ### Testing Dataset:
136
  Not Applicable β€” the model is qualitatively evaluated via the `log_image` validation callback (inpainted samples logged every N training steps) rather than against a held-out test split.<br>
137
 
138
  ### Evaluation Dataset:
139
+ Not Applicable β€” no held-out evaluation dataset. The model is evaluated qualitatively during training via the `log_image` validation callback, plus FID and nearest-neighbour metrics (`nn_score`, `mnn_score`) logged at each validation step. See the [Explainability](./explainability.md) subcard for performance metric details.<br>
140
 
141
  ## Inference:
142
  **Acceleration Engine:** PyTorch (native, FP32). Multi-GPU rank-sharded inference is supported via `torchrun --nproc_per_node=<N>` with the `predict2_anomaly_gen_fsdp_2b` experiment.<br>
 
149
  ## Ethical Considerations:
150
  NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.<br>
151
 
152
+ Please make sure you have proper rights and permissions for all input image and video content.<br>
153
 
154
  Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment.<br>
155
 
156
+ For more detailed information on ethical considerations for this model, please see the [Bias](./bias.md), [Explainability](./explainability.md), [Privacy](./privacy.md), and [Safety & Security](./safety.md) subcards.<br>
157
 
158
  Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).<br>
159
+
160
+
161
+ ## Bias
162
+
163
+ Field | Response
164
+ :---------------------------------------------------------------------------------------------------|:---------------
165
+ Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | Not Applicable β€” the model operates on images of printed circuit boards (inanimate objects) and does not target or classify human subjects.
166
+ Measures taken to mitigate against unwanted bias: | Not Applicable β€” the model is a synthetic anomaly image generator for industrial defect inspection and does not produce classifications or predictions on protected attributes.
167
+ Bias Metric (If Measured): | Not Measured.
168
+
169
+
170
+ ## Explainability
171
+
172
+ Field | Response
173
+ :------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------
174
+ Intended Task/Domain: | Industrial visual inspection β€” synthetic anomaly image generation (image-to-image inpainting) for printed-circuit-board defect detection.
175
+ Model Type: | Diffusion model β€” Transformer-based DiT denoiser (Cosmos-Predict2 2B, frozen) conditioned by learned anomaly-token embeddings, a frozen NV-DINOv2 mask encoder, and a trained 2-layer MLP adapter that projects mask-encoder features into the DiT conditioning space.
176
+ Intended Users: | ML engineers and computer-vision practitioners building defect-detection or segmentation models for printed-circuit-board QA.
177
+ Output: | Image (single 512Γ—512 RGB anomaly image per inference call; multiple per testcase line when `num_generated_images > 1`).
178
+ Describe how the model works: | A user-supplied binary mask indicating where the defect should appear is encoded by a frozen NV-DINOv2 backbone and projected through a trained 2-layer MLP adapter into the diffusion DiT conditioning space. The selected `<texture>+<anomaly_type>` token (`IC+bridge`, `passive_component+excess_solder`, or `passive_component+missing`) retrieves a learned 256-token embedding which further conditions the frozen Cosmos-Predict2 2B T2I DiT denoiser. The denoiser then produces an inpainting-style sample where pixels inside the mask are generated as a defect that matches the trained type, while pixels outside the mask are preserved from the clean reference image (with optional crop-and-paste / Poisson blending). The two-texture design (`IC` vs. `passive_component`) lets a single checkpoint cover defects that appear differently depending on board region.
179
+ Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable
180
+ Technical Limitations & Mitigation: | Trained on a small number of examples per defect pair (8, 16, and 62 anomaly images for `IC+bridge`, `passive_component+excess_solder`, and `passive_component+missing` respectively). Out-of-distribution board layouts, mask placements that deviate substantially from the trained distribution, mismatched texture choice (e.g., using `IC+bridge` on a passive-component region of the board), or any defect not in the three trained pairs may yield low-fidelity outputs. The `IC+bridge` pair has the fewest training examples and is the most sensitive to mask-placement quality. Mitigation: (a) use the Automatic Mask Placement (AMP) tool, which automatically restricts the user-supplied mask to legal regions of interest for each `<texture>+<anomaly_type>` pair (e.g., only on IC pads for `IC+bridge`, only on passive-component pads for `passive_component+*`), preventing the model from generating defects in implausible board locations; (b) run `scripts/anomaly_gen/filter.py` on the generated dataset β€” the filter scores each output with a Generative Image Quality Assessment (G-IQA) model and discards samples whose quality/realism score falls below a configurable threshold, so unrealistic synthetic images do not pollute the downstream training set; (c) validate any downstream detector trained on synthetic samples against real defect imagery before deployment.
181
+ Verified to have met prescribed NVIDIA quality standards: | Yes.
182
+ Performance Metrics: | FID (logged during training validation), nearest-neighbor metrics (`nn_score`, `mnn_score`), and visual inspection of `log_image` callback outputs.
183
+ Potential Known Risks: | Generated synthetic defects may not perfectly cover all real-world defect variations. Downstream detection models trained primarily on these synthetic samples should be validated against real defect data before deployment in production QA lines.<br><br>This model can generate synthetic images and may produce content that is offensive, unsafe, misleading, indecent, or unsuitable for a target deployment. Users should implement robust safety guardrails β€” including content filtering, abuse monitoring, and access controls β€” to reduce the risk of harmful outputs. Users are responsible for ensuring that their use of the model complies with all applicable laws and regulations, and for regularly reviewing and updating their guardrails as risks evolve.
184
+ Licensing: | [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
185
+
186
+
187
+ ## Privacy
188
+
189
+ Field | Response
190
+ :----------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------
191
+ Generatable or reverse engineerable personal data? | No
192
+ Personal data used to create this model? | No
193
+ Was consent obtained for any personal data used? | Not Applicable
194
+ How often is dataset reviewed? | Before Release
195
+ Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? | No
196
+ Is there provenance for all datasets used in training? | Yes
197
+ Does data labeling (annotation, metadata) comply with privacy laws? | Yes
198
+ Is data compliant with data subject requests for data correction or removal, if such a request was made? | Not Applicable
199
+ Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/
200
+
201
+
202
+ ## Safety
203
+
204
+ Field | Response
205
+ :---------------------------------------------------|:----------------------------------
206
+ Model Application Field(s): | Industrial/Machinery and Robotics β€” specifically synthetic data generation for manufacturing QA / visual inspection of printed circuit boards.
207
+ Describe the life critical impact (if present). | Not Applicable.
208
+ Use Case Restrictions: | Abide by the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
209
+ Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.