Files changed (1) hide show
  1. README.md +84 -92
README.md CHANGED
@@ -1,115 +1,107 @@
1
  # ArtiFixer Overview
2
-
3
- ## Description:
4
- ArtiFixer is a few-step causal auto-regressive model that enhances and extends 3D reconstruction. The related source code provides implementations for training, evaluation, and inference, supporting various stages including bidirectional training, diffusion forcing, and self forcing DMD distillation.
5
- ArtiFixer was developed by NVIDIA (Spatial Intelligence Lab) and based on a Wan2.1's 14B model.
6
- _This model is for research and development only._
7
-
8
-
9
  ### License/Terms of Use:
10
- GOVERNING DOWNLOAD TERMS: Use of the model is governed by the [NVIDIA License](https://developer.download.nvidia.com/licenses/NVIDIA-OneWay-Noncommercial-License-22Mar2022.pdf?t=eyJscyI6ImdzZW8iLCJsc2QiOiJodHRwczovL3d3dy5nb29nbGUuY29tLyIsIm5jaWQiOiJzby15b3V0LTg3MTcwMS12dDQ4In0=).
 
11
  ADDITIONAL INFORMATION: The Wan 2.1 14B base model is governed by the [Apache License, Version 2.0.](https://www.apache.org/licenses/LICENSE-2.0)
12
- ### Deployment Geography:
 
 
13
  Global
14
-
15
- ### Use Case:
16
- Developers and researchers working on 3D reconstruction, diffusion models, and auto-regressive techniques for enhancing and extending 3D reconstruction capabilities.
17
-
18
-
19
  ### Release Date:
20
- **Other:** Hugging Face: 06/01/2026 via https://research.nvidia.com/labs/sil/projects/artifixer/
21
-
22
-
23
  ## Reference(s):
24
- [ArtiFixer: Enhancing and Extending 3D Reconstruction with Auto-Regressive Diffusion Models](https://research.nvidia.com/labs/sil/projects/artifixer/assets/paper.pdf)
25
-
26
- ## Model Architecture:
27
- **Architecture Type:** Transformer
28
- **Network Architecture:** ArtifixerTransformerBlock
 
 
29
  **This model was developed based on Wan-AI/Wan2.1-T2V-14B-Diffusers.**
30
- **Number of model parameters:** 14B (1.4*10^10)
31
-
32
-
33
-
34
- ## Input:
35
- **Input Type(s):** Image, text
36
- **Input Format(s):** RGB (Red, Green, Blue), opacity maps, camera ray maps and text
37
- **Input Parameters:** Rendered RGB and opacity maps from underlying 3D representationm, camera ray maps and text prompt
38
- **Other Properties Related to Input:** Model refines and extends renderings from imperfect 3D reconstruction, requiring camera intrinsics and camera poses.
39
-
40
-
41
- ## Output:
42
  **Output Type(s):** Image
43
  **Output Format:** RGB (Red, Green, Blue)
44
- **Output Parameters:** Three-Dimensional (3D)
45
- **Other Properties Related to Output:** Generates enhanced images via few-step causal auto-regressive diffusion model.
46
-
47
-
48
- Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
49
-
50
  ## Software Integration:
51
- **Runtime Engine(s):**
52
- * TensorRT-LLM
53
- * vLLM
54
- * Transformers
55
-
56
- **Supported Hardware Microarchitecture Compatibility:**
57
- * NVIDIA Blackwell
58
- * NVIDIA Hopper
59
-
60
- **Supported Operating System(s):** Linux
61
-
62
- The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
63
-
64
-
65
- ## Model Version(s):
66
  ArtiFixer v1.0
67
-
68
- ArtiFixer integrates with PyTorch and requires CUDA environments. It uses Dockerfiles for CUDA 12 and 13, supporting both x86_64 and aarch64 architectures. The model can be run using torchrun with multi-GPU setups and requires specific dependencies like flash-attn, accelerate, diffusers, and transformers.
69
-
70
- ## Training, Testing, and Evaluation Datasets:
71
-
72
-
73
- ## Training Dataset:
74
-
75
- **Data Modality:**
76
- * Text
77
- * Image
78
-
79
-
80
  **Image Training Data Size:** Less than a Million Images
81
  **Text Training Data Size:** Less than a Billion Tokens
82
  **Data Collection Method by dataset:** Hybrid: Automated/Synthetic
83
- **Labeling Method by dataset:** Human
84
- **Properties (Quantity, Dataset Descriptions, Sensor(s)):** Multimodal dataset combining 3D reconstruction data from DL3DV10K with text captions. Includes sparse 3D point clouds, camera parameters, and RGB images. Data is collected from various scenes and processed using COLMAP for reconstruction. The dataset supports training for 3D reconstruction tasks with both real and generated data.
85
-
86
  ### Testing Dataset:
87
-
88
  **Data Collection Method by dataset:** Automated
89
  **Labeling Method by dataset:** Automated
90
- **Properties (Quantity, Dataset Descriptions, Sensor(s)):** We evaluate our model on 4 benchmarks (DL3DV, Nerfbusters, M360, Tandt). We follow standard procedures for sparse reconstruction evaluation: only use a subset of frames ie. 3, 6 or 9 to evaluate on the remainaing held out frames. We follow the protocol proposed by [Cat3D](https://cat3d.github.io/) for M360 evaluation, [Difix3D](https://research.nvidia.com/labs/toronto-ai/difix3d/) for Nerfbusters and DL3DV evaluation, [ReconX](https://liuff19.github.io/ReconX/) for TandT and finally propose our own evaluation set for DL3DV.
91
-
92
  ### Evaluation Dataset:
93
- | Benchmark | Metric | Score |
94
- | --- | --- | --- |
95
- | M360 3view reconstruction | PSNR | 17.51 |
96
- | M360 6view reconstruction | PSNR | 18.95 |
97
- additional evaluations in https://research.nvidia.com/labs/sil/projects/artifixer/.
98
-
 
 
 
 
 
99
  **Data Collection Method by dataset:** Hybrid: Automated/Human
100
  **Labeling Method by dataset:** Hybrid: Automated/Human
101
- **Properties (Quantity, Dataset Descriptions, Sensor(s)):** Evaluated on diverse 3D reconstruction benchmarks including DL3DV and Nerfbusters, assessing performance on held-out validation frames, full source trajectories, and prepared trajectories. Metrics include reconstruction quality, multi-view consistency and inference speed.
102
-
103
-
104
-
105
  ## Inference:
106
- **Acceleration Engine:** TensorRT-LLM, vLLM
107
- **Test Hardware:**
108
- * NVIDIA A100 80GB
109
- * NVIDIA H100
110
-
111
-
112
  ## Ethical Considerations:
113
- NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
114
-
115
  Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).
 
1
  # ArtiFixer Overview
2
+
3
+ ## Description:
4
+
5
+ ArtiFixer is a few-step causal auto-regressive model that enhances and extends 3D reconstruction. The related source code provides implementations for training, evaluation, and inference, supporting various stages including bidirectional training, diffusion forcing, and Self-Forcing-style DMD distillation.
6
+ ArtiFixer was developed by NVIDIA (Spatial Intelligence Lab) and based on Wan2.1's 14B model.
7
+ _This model is for research and development only._
8
+
9
  ### License/Terms of Use:
10
+
11
+ GOVERNING DOWNLOAD TERMS: Use of the model is governed by the [NVIDIA License](https://developer.download.nvidia.com/licenses/NVIDIA-OneWay-Noncommercial-License-22Mar2022.pdf?t=eyJscyI6ImdzZW8iLCJsc2QiOiJodHRwczovL3d3dy5nb29nbGUuY29tLyIsIm5jaWQiOiJzby15b3V0LTg3MTcwMS12dDQ4In0=).
12
  ADDITIONAL INFORMATION: The Wan 2.1 14B base model is governed by the [Apache License, Version 2.0.](https://www.apache.org/licenses/LICENSE-2.0)
13
+
14
+ ### Deployment Geography:
15
+
16
  Global
17
+
18
+ ### Use Case:
19
+
20
+ Developers and researchers working on 3D reconstruction, diffusion models, and auto-regressive techniques for enhancing and extending 3D reconstruction capabilities.
21
+
22
  ### Release Date:
23
+
24
+ **Other:** Hugging Face: 06/04/2026 via https://research.nvidia.com/labs/sil/projects/artifixer/
25
+
26
  ## Reference(s):
27
+
28
+ [ArtiFixer: Enhancing and Extending 3D Reconstruction with Auto-Regressive Diffusion Models](https://research.nvidia.com/labs/sil/projects/artifixer/assets/paper.pdf)
29
+
30
+ ## Model Architecture:
31
+
32
+ **Architecture Type:** Transformer
33
+ **Network Architecture:** ArtifixerTransformer (built on Wan2.1's WanTransformer3DModel)
34
  **This model was developed based on Wan-AI/Wan2.1-T2V-14B-Diffusers.**
35
+ **Number of model parameters:** ~16.9B trainable (16,910,955,584)
36
+
37
+ ## Input:
38
+
39
+ **Input Type(s):** Image, text
40
+ **Input Format(s):** RGB (Red, Green, Blue), opacity maps, camera ray maps, and text
41
+ **Input Parameters:** Rendered RGB and opacity maps from the underlying 3D representation, camera ray maps, and text prompt
42
+ **Other Properties Related to Input:** Model refines and extends renderings from imperfect 3D reconstruction, requiring camera intrinsics and camera poses.
43
+
44
+ ## Output:
45
+
 
46
  **Output Type(s):** Image
47
  **Output Format:** RGB (Red, Green, Blue)
48
+ **Output Parameters:** Two-Dimensional (2D) image frames
49
+ **Other Properties Related to Output:** Generates enhanced images via a few-step causal auto-regressive diffusion model.
50
+
51
+ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g., GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
52
+
 
53
  ## Software Integration:
54
+
55
+ **Runtime Engine(s):** PyTorch, Hugging Face Diffusers, Hugging Face Transformers, FlashAttention (FA3 on Hopper, FA4 on Blackwell)
56
+ **Supported Hardware Microarchitecture Compatibility:** NVIDIA Ampere, NVIDIA Hopper, NVIDIA Blackwell
57
+ **Supported Operating System(s):** Linux
58
+
59
+ The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
60
+
61
+ ## Model Version(s):
62
+
 
 
 
 
 
 
63
  ArtiFixer v1.0
64
+ ArtiFixer integrates with PyTorch and requires CUDA environments. It uses Dockerfiles for CUDA 12 and 13, supporting both x86_64 and aarch64 architectures. The model can be run using torchrun (or accelerate) with multi-GPU setups and requires dependencies such as flash-attn, accelerate, diffusers, and transformers.
65
+
66
+ ## Training, Testing, and Evaluation Datasets:
67
+
68
+ ### Training Dataset:
69
+
70
+ **Data Modality:** Text, Image
 
 
 
 
 
 
71
  **Image Training Data Size:** Less than a Million Images
72
  **Text Training Data Size:** Less than a Billion Tokens
73
  **Data Collection Method by dataset:** Hybrid: Automated/Synthetic
74
+ **Labeling Method by dataset:** Automated
75
+ **Properties (Quantity, Dataset Descriptions, Sensor(s)):** Multimodal dataset combining 3D reconstruction data from DL3DV-10K (the DL3DV-ALL-960P release) with text captions. Includes sparse 3D point clouds, camera parameters, and RGB images. Camera poses are estimated with COLMAP, reconstructions are produced with 3DGUT (MCMC densification, via the 3DGRUT library), text captions are generated with a vision-language model (Qwen3-VL-30B-A3B-Instruct), and metric scale is estimated with MoGe. The dataset supports training for 3D reconstruction tasks with both real and generated data.
76
+
77
  ### Testing Dataset:
78
+
79
  **Data Collection Method by dataset:** Automated
80
  **Labeling Method by dataset:** Automated
81
+ **Properties (Quantity, Dataset Descriptions, Sensor(s)):** We evaluate our model on 4 benchmarks (DL3DV, Nerfbusters, M360, TandT). We follow standard procedures for sparse reconstruction evaluation: only use a subset of frames, i.e., 3, 6 or 9, to evaluate on the remaining held-out frames. We follow the protocol proposed by [Cat3D](https://cat3d.github.io/) for M360 evaluation, [Difix3D+](https://research.nvidia.com/labs/toronto-ai/difix3d/) for Nerfbusters and DL3DV evaluation, [ReconX](https://liuff19.github.io/ReconX/) for TandT, and finally propose our own evaluation set for DL3DV.
82
+
83
  ### Evaluation Dataset:
84
+
85
+ Artifact removal on the Nerfbusters and DL3DV benchmarks (Difix3D+ protocol; NB = Nerfbusters):
86
+
87
+ | Method | NB PSNR↑ | NB SSIM↑ | NB LPIPS↓ | NB FID↓ | DL3DV PSNR | DL3DV SSIM↑ | DL3DV LPIPS↓ | DL3DV FID↓ |
88
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- |
89
+ | ArtiFixer | 19.83 | 0.701 | 0.254 | 37.78 | 19.73 | 0.672 | 0.231 | 20.85 |
90
+ | ArtiFixer 3D | 20.24 | 0.729 | 0.267 | 39.67 | 20.14 | 0.705 | 0.256 | 24.27 |
91
+ | ArtiFixer 3D+ | 20.12 | 0.713 | 0.264 | 41.17 | 20.06 | 0.686 | 0.242 | 22.61 |
92
+
93
+ Additional results — Mip-NeRF 360 sparse-view (3/6/9-view), DL3DV novel-content generation, and Tanks & Temples (supplement) — are reported in the [paper](https://research.nvidia.com/labs/sil/projects/artifixer/assets/paper.pdf).
94
+
95
  **Data Collection Method by dataset:** Hybrid: Automated/Human
96
  **Labeling Method by dataset:** Hybrid: Automated/Human
97
+ **Properties (Quantity, Dataset Descriptions, Sensor(s)):** Evaluated on diverse 3D reconstruction benchmarks including DL3DV and Nerfbusters, assessing performance on held-out validation frames, full source trajectories, and prepared trajectories. Metrics include reconstruction quality, multi-view consistency, and inference speed.
98
+
 
 
99
  ## Inference:
100
+
101
+ **Acceleration Engine:** FlashAttention (FA3/FA4); PyTorch SDPA (cuDNN) fallback on Ampere
102
+ **Test Hardware:** NVIDIA A100 80GB, NVIDIA H100, NVIDIA GB200 (Blackwell)
103
+
 
 
104
  ## Ethical Considerations:
105
+
106
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
107
  Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).