---
license: other
license_name: nvidia-open-model-license
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
tags:
  - robotics
  - video-generation
  - diffusion
  - action-conditioned
  - dreamdojo
  - cosmos-predict2.5
library_name: diffusers
pipeline_tag: video-to-video
---

# DreamDojo-AgiBot-2B-Diffusers

Fine-tuned on AgiBot robot data. Part of the [DreamDojo](https://github.com/NVIDIA/DreamDojo) model family.

| | |
|---|---|
| **Size** | 2B |
| **Stage** | Post-training |
| **Architecture** | DiT (Diffusion Transformer) with AdaLN-LoRA |
| **Base** | Cosmos Predict 2.5 |

## Checkpoint Structure

```
DreamDojo-AgiBot-2B-Diffusers/
├── transformer/            # DiT backbone (sharded safetensors)
├── crossattn_adapter/      # Text-to-DiT projection (100352 → 1024)
├── vae/                    # AutoencoderKLWan (standard diffusers)
├── lam/                    # Latent Action Model (710M params)
├── text_encoder/           # Cosmos-Reason1-7B
├── scheduler/              # FlowMatchEulerDiscreteScheduler
├── action_processor/       # DreamDojo-specific config
└── config.json
```

## Architecture

|  | 2B |
|--|------|
| Model channels | 2048 |
| Transformer blocks | 28 |
| Attention heads | 16 |
| Patch size (spatial / temporal) | 2 / 1 |
| Action dim | 384 (unified) |

## Citation

```bibtex
@article{dreamdojo2025,
  title={DreamDojo: Advancing Real-World Robot Policies Through Generated Interactive Environments},
  author={NVIDIA},
  year={2025}
}
```

## License

Please refer to the [NVIDIA DreamDojo](https://github.com/NVIDIA/DreamDojo) repository for license terms.