XuyaoWang's picture
Upload folder using huggingface_hub
26de0fc verified
---
license: other
license_name: nvidia-open-model-license
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
tags:
- robotics
- video-generation
- diffusion
- action-conditioned
- dreamdojo
- cosmos-predict2.5
library_name: diffusers
pipeline_tag: video-to-video
---
# DreamDojo-AgiBot-2B-Diffusers
Fine-tuned on AgiBot robot data. Part of the [DreamDojo](https://github.com/NVIDIA/DreamDojo) model family.
| | |
|---|---|
| **Size** | 2B |
| **Stage** | Post-training |
| **Architecture** | DiT (Diffusion Transformer) with AdaLN-LoRA |
| **Base** | Cosmos Predict 2.5 |
## Checkpoint Structure
```
DreamDojo-AgiBot-2B-Diffusers/
β”œβ”€β”€ transformer/ # DiT backbone (sharded safetensors)
β”œβ”€β”€ crossattn_adapter/ # Text-to-DiT projection (100352 β†’ 1024)
β”œβ”€β”€ vae/ # AutoencoderKLWan (standard diffusers)
β”œβ”€β”€ lam/ # Latent Action Model (710M params)
β”œβ”€β”€ text_encoder/ # Cosmos-Reason1-7B
β”œβ”€β”€ scheduler/ # FlowMatchEulerDiscreteScheduler
β”œβ”€β”€ action_processor/ # DreamDojo-specific config
└── config.json
```
## Architecture
| | 2B |
|--|------|
| Model channels | 2048 |
| Transformer blocks | 28 |
| Attention heads | 16 |
| Patch size (spatial / temporal) | 2 / 1 |
| Action dim | 384 (unified) |
## Citation
```bibtex
@article{dreamdojo2025,
title={DreamDojo: Advancing Real-World Robot Policies Through Generated Interactive Environments},
author={NVIDIA},
year={2025}
}
```
## License
Please refer to the [NVIDIA DreamDojo](https://github.com/NVIDIA/DreamDojo) repository for license terms.