behavior
cloning
gaming
agent
NitroGen / EXPLAINABILITY.md
mancub's picture
Duplicate from nvidia/NitroGen
3fca35c verified
Field Response
Intended Task/Domain: Vision-to-action model designed to play video games directly from raw frames
Model Type: Transformer
Intended Users: Researchers, game developers, open source community, gamers. Potential applications include next-generation game AI, automating testing for video games, and generally advancing research in embodied AI.
Output: Gamepad actions
Describe how the model works: Image inputs are encoded with a vision transformer. A separate diffusion transformer is conditioned on the image embeddings, which then denoise an action tensor
Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: Not Applicable
Technical Limitations & Mitigation: This model performs well on games played with a gamepad. Model may not perform well on games played with a keyboard or mouse.
Verified to have met prescribed NVIDIA quality standards: Yes
Performance Metrics: Task success rate
Potential Known Risks: The model may occasionally lose at certain games.
Licensing: Governing Terms:  NVIDIA License.  Additional Information:  Apache License for https://huggingface.co/google/siglip2-base-patch16-224.