Field                                                                                                  |  Response
:------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------
Intended Task/Domain:                                                                   |  Vision-to-action model designed to play video games directly from raw frames
Model Type:                                                                                            |  Transformer
Intended Users:                                                                                        |  Researchers, game developers, open source community, gamers. Potential applications include next-generation game AI, automating testing for video games, and generally advancing research in embodied AI.
Output:                                                                                                |  Gamepad actions
Describe how the model works:                                                                          |  Image inputs are encoded with a vision transformer. A separate diffusion transformer is conditioned on the image embeddings, which then denoise an action tensor 
Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of:  |  Not Applicable
Technical Limitations & Mitigation:                                                                    |  This model performs well on games played with a gamepad. Model may not perform well on games played with a keyboard or mouse.
Verified to have met prescribed NVIDIA quality standards:  |  Yes
Performance Metrics:                                                                                   |  Task success rate 
Potential Known Risks:                                                                                 |  The model may occasionally lose at certain games.
Licensing:                                                                                             |  Governing Terms:  [NVIDIA License](https://developer.download.nvidia.com/licenses/NVIDIA-OneWay-Noncommercial-License-22Mar2022.pdf).  Additional Information:  [Apache License](https://huggingface.co/datasets/choosealicense/licenses/blob/main/markdown/apache-2.0.md) for [https://huggingface.co/google/siglip2-base-patch16-224]().