behavior
cloning
gaming
agent
mancub nvlm commited on
Commit
3fca35c
·
verified ·
0 Parent(s):

Duplicate from nvidia/NitroGen

Browse files

Co-authored-by: nvlm <nvlm@users.noreply.huggingface.co>

Files changed (8) hide show
  1. .gitattributes +35 -0
  2. BIAS.md +4 -0
  3. EXPLAINABILITY.md +13 -0
  4. LICENSE +37 -0
  5. PRIVACY.md +11 -0
  6. README.md +161 -0
  7. SAFETY.md +6 -0
  8. ng.pt +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
BIAS.md ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ Field | Response
2
+ :---------------------------------------------------------------------------------------------------|:---------------
3
+ Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | Not Applicable
4
+ Measures taken to mitigate against unwanted bias: | Not Applicable
EXPLAINABILITY.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Field | Response
2
+ :------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------
3
+ Intended Task/Domain: | Vision-to-action model designed to play video games directly from raw frames
4
+ Model Type: | Transformer
5
+ Intended Users: | Researchers, game developers, open source community, gamers. Potential applications include next-generation game AI, automating testing for video games, and generally advancing research in embodied AI.
6
+ Output: | Gamepad actions
7
+ Describe how the model works: | Image inputs are encoded with a vision transformer. A separate diffusion transformer is conditioned on the image embeddings, which then denoise an action tensor
8
+ Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable
9
+ Technical Limitations & Mitigation: | This model performs well on games played with a gamepad. Model may not perform well on games played with a keyboard or mouse.
10
+ Verified to have met prescribed NVIDIA quality standards: | Yes
11
+ Performance Metrics: | Task success rate
12
+ Potential Known Risks: | The model may occasionally lose at certain games.
13
+ Licensing: | Governing Terms:  [NVIDIA License](https://developer.download.nvidia.com/licenses/NVIDIA-OneWay-Noncommercial-License-22Mar2022.pdf).  Additional Information:  [Apache License](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) for [https://huggingface.co/google/siglip2-base-patch16-224]().
LICENSE ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ NVIDIA License
2
+
3
+ 1. Definitions
4
+
5
+ “Licensor” means any person or entity that distributes its Work.
6
+ “Work” means (a) the original work of authorship made available under this license, which may include software, documentation, or other files, and (b) any additions to or derivative works thereof that are made available under this license.
7
+ The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the meaning as provided under U.S. copyright law; provided, however, that for the purposes of this license, derivative works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work.
8
+ Works are “made available” under this license by including in or with the Work either (a) a copyright notice referencing the applicability of this license to the Work, or (b) a copy of this license.
9
+
10
+ 2. License Grant
11
+
12
+ 2.1 Copyright Grant. Subject to the terms and conditions of this license, each Licensor grants to you a perpetual, worldwide, non-exclusive, royalty-free, copyright license to use, reproduce, prepare derivative works of, publicly display, publicly perform, sublicense and distribute its Work and any resulting derivative works in any form.
13
+
14
+ 3. Limitations
15
+
16
+ 3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under this license, (b) you include a complete copy of this license with your distribution, and (c) you retain without modification any copyright, patent, trademark, or attribution notices that are present in the Work.
17
+
18
+ 3.2 Derivative Works. You may specify that additional or different terms apply to the use, reproduction, and distribution of your derivative works of the Work (“Your Terms”) only if (a) Your Terms provide that the use limitation in Section 3.3 applies to your derivative works; (b) you comply with Other Licenses, and (c) you identify the specific derivative works that are subject to Your Terms and Other Licenses, as applicable. Notwithstanding Your Terms, this license (including the redistribution requirements in Section 3.1) will continue to apply to the Work itself.
19
+
20
+ 3.3 Use Limitation. The Work and any derivative works thereof only may be used or intended for use non-commercially. As used herein, “non-commercially” means for non-commercial research purposes only, and excludes any military, surveillance, service of nuclear technology or biometric processing purposes.
21
+
22
+ 3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor (including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then your rights under this license from such Licensor (including the grant in Section 2.1) will terminate immediately.
23
+
24
+ 3.5 Trademarks. This license does not grant any rights to use any Licensor’s or its affiliates’ names, logos, or trademarks, except as necessary to reproduce the notices described in this license.
25
+
26
+ 3.6 Termination. If you violate any term of this license, then your rights under this license (including the grant in Section 2.1) will terminate immediately.
27
+
28
+ 3.7 Components Under Other Licenses. The Work may include or be distributed with components provided with separate legal notices or terms that accompany the components, such as open source software licenses and other license terms, including but not limited to the Meta OPT-IML 175B License Agreement (“Other Licenses”). The components are subject to the applicable Other Licenses, including any proprietary notices, disclaimers, requirements and extended use rights; except that this Agreement will prevail regarding the use of third-party software, unless a third-party software license requires it license terms to prevail.
29
+
30
+ 4. Disclaimer of Warranty.
31
+
32
+ THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING WARRANTIES OR CONDITIONS OF
33
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. YOU BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE.
34
+
35
+ 5. Limitation of Liability.
36
+
37
+ EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
PRIVACY.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Field | Response
2
+ :----------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------
3
+ Generatable or reverse engineerable personal data? | No
4
+ Personal data used to create this model? | No
5
+ Was consent obtained for any personal data used? | Not Applicable
6
+ How often is dataset reviewed? | During dataset creation and model training
7
+ Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? | No
8
+ Is there provenance for all datasets used in training? | Yes
9
+ Does data labeling (annotation, metadata) comply with privacy laws? | Yes
10
+ Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data.
11
+ Applicable Privacy Policy | https://www.nvidia.com/en-us/about-nvidia/privacy-policy/
README.md ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - nvidia/NitroGen
4
+ tags:
5
+ - behavior
6
+ - cloning
7
+ - gaming
8
+ - agent
9
+ ---
10
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/67d8509cb6b70254852d734d/u3VY6_KoT6tEs86YPehU2.gif" width="100%" />
11
+
12
+ <div align="center">
13
+ <p style="font-size: 1.2em;">
14
+ <a href="https://nitrogen.minedojo.org/"><strong>Website</strong></a> |
15
+ <a href="https://huggingface.co/nvidia/NitroGen"><strong>Model</strong></a> |
16
+ <a href="https://huggingface.co/datasets/nvidia/NitroGen"><strong>Dataset</strong></a> |
17
+ <a href="https://nitrogen.minedojo.org/assets/documents/nitrogen.pdf"><strong>Paper</strong></a>
18
+ </p>
19
+ </div>
20
+
21
+ # Model Overview
22
+
23
+ ### Description:
24
+
25
+ NitroGen is a unified vision-to-action model designed to play video games directly from raw frames. It takes video game footage as input and outputs gamepad actions. Unlike models trained with rewards or task objectives, NitroGen is trained purely through large-scale imitation learning on videos of human gameplay. NitroGen works best on games designed for gamepad controls (e.g., action, platformer, and racing games) and is less effective on games that rely heavily on mouse and keyboard (e.g., RTS, MOBA).
26
+
27
+ The goal of the NitroGen project is to explore whether large-scale training on diverse human gameplay leads to emergent, general-purpose embodied abilities, similar to how scaling has unlocked emergent behaviors in large language models.
28
+
29
+ Potential applications include next-generation game AI, automated QA for video games, and advancing research in general embodied AI.
30
+
31
+ NitroGen 1 was developed by NVIDIA and is the first model of the series. This model is for research and development only.
32
+
33
+ ### License/Terms of Use:
34
+
35
+ Governing Terms:  [NVIDIA License](https://developer.download.nvidia.com/licenses/NVIDIA-OneWay-Noncommercial-License-22Mar2022.pdf).
36
+
37
+ Additional Information: [Apache License](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) for [https://huggingface.co/google/siglip2-base-patch16-224]().
38
+
39
+ ### Deployment Geography:
40
+ Global <br>
41
+
42
+ ### Use Case: <br>
43
+ Researchers, engineers, open source community, companies, gamers. Potential applications include next-generation game AI, automated testing for video games, and generally advancing research in embodied AI.<br>
44
+
45
+ ### Release Date: <br>
46
+ GitHub 12/19/2025 via []() <br>
47
+ GitHub 12/19/2025 via [https://huggingface.co/nvidia/NitroGen](https://huggingface.co/nvidia/NitroGen) <br>
48
+
49
+ ## References:
50
+ [VPT](https://arxiv.org/abs/2206.11795), a Minecraft agent trained from internet videos.
51
+ [SIMA](https://arxiv.org/abs/2404.10179), a multi-game agent trained to follow text instructions.
52
+ [GR00T N1](https://arxiv.org/abs/2503.14734), an open foundation model for generalist humanoid robots.
53
+ <br>
54
+
55
+ ## Model Architecture:
56
+ **Architecture Type:** Vision Transformer, Diffusion Transformer <br>
57
+
58
+ **Network Architecture:**
59
+ - RGB frames are processed through a pre-trained vision transformer (SigLip2).
60
+ - A diffusion matching transformer (DiT) then generates actions, conditioned on SigLip output.
61
+ <br>
62
+
63
+ **This model was developed based on** SigLip2 <br>
64
+
65
+ **Number of model parameters:** $4.93 × 10^8$ <br>
66
+
67
+ ## Input(s): <br>
68
+ **Input Type(s):** Image <br>
69
+
70
+ **Input Format(s):** Red, Green, Blue (RGB) <br>
71
+
72
+ **Input Parameters:** Two-Dimensional (2D) <br>
73
+
74
+ **Other Properties Related to Input:** 256x256 Images
75
+
76
+ ## Output(s)
77
+
78
+ **Output Type(s):** Actions for gamepad/game controllers<br>
79
+
80
+ **Output Format(s):** Tabular <br>
81
+
82
+ **Output Parameters:** 2D: one action dimension and one temporal dimension <br>
83
+
84
+ **Other Properties Related to Output:** The output has shape 21x16, two 2D Continuous-value vectors for each joystick, 17 binary values for each button.
85
+
86
+
87
+ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. <br>
88
+
89
+ ## Software Integration:
90
+ **Runtime Engine(s):**
91
+ No runtime engine was used.
92
+
93
+
94
+ **Supported Hardware Microarchitecture Compatibility:** <br>
95
+ * NVIDIA Blackwell <br>
96
+ * NVIDIA Hopper<br>
97
+
98
+ **Preferred/Supported Operating System(s):**
99
+
100
+ The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment. <br>
101
+
102
+ * Linux <br>
103
+ * Windows <br>
104
+
105
+ ## Model Version(s):
106
+ V1 <br>
107
+
108
+
109
+ ## Training, Testing, and Evaluation Datasets:
110
+
111
+ ## Training Dataset:
112
+
113
+ **Data Modality**<br>
114
+ * Image <br>
115
+ * Video <br>
116
+
117
+ **Image Training Data Size**<br>
118
+ * More than 1 Billion Images <br>
119
+
120
+
121
+ **Video Training Data Size**<br>
122
+ * 10,000 to 1 Million Hours <br>
123
+
124
+ **Data Collection Method by dataset** <br>
125
+ * Automated <br>
126
+
127
+ **Labeling Method by dataset** <br>
128
+ * Synthetic <br>
129
+
130
+ **Properties:** 40,000 publicly available videos, labeled with frame-wise actions <br>
131
+
132
+ ### Testing Dataset:
133
+
134
+ **Data Collection Method by dataset** <br>
135
+ * Automated <br>
136
+
137
+ **Labeling Method by dataset** <br>
138
+ * Synthetic <br>
139
+
140
+ **Properties:** 40,000 publicly available videos, labeled with frame-wise actions <br>
141
+
142
+ ### Evaluation Dataset:
143
+
144
+ **Data Collection Method by dataset** <br>
145
+ * Automated <br>
146
+
147
+ **Labeling Method by dataset** <br>
148
+ * Synthetic <br>
149
+
150
+ **Properties:** 40,000 publicly available videos, labeled with frame-wise actions <br>
151
+
152
+ # Inference:
153
+ **Acceleration Engine:** None <br>
154
+ **Test Hardware:** H100 <br>
155
+
156
+ ## Ethical Considerations:
157
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. <br>
158
+
159
+ For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.
160
+
161
+ Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
SAFETY.md ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ Field | Response
2
+ :---------------------------------------------------|:----------------------------------
3
+ Model Application Field(s): | Media & Entertainment
4
+ Describe the life critical impact (if present). | Not Applicable
5
+ Use Case Restrictions: | Abide by [NVIDIA License](https://developer.download.nvidia.com/licenses/NVIDIA-OneWay-Noncommercial-License-22Mar2022.pdf).  Additional Information:  [Apache License](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) for [https://huggingface.co/google/siglip2-base-patch16-224]().
6
+ Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
ng.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a266f5fb9c7dbdcdf97216558d2d82075a9a994b824cda69afa9fd3280260a81
3
+ size 1974723762