--- license: other license_name: nvidia-open-model-license license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ tags: - robotics - video-generation - diffusion - action-conditioned - dreamdojo - cosmos-predict2.5 library_name: diffusers pipeline_tag: video-to-video --- # DreamDojo-AgiBot-2B-Diffusers Fine-tuned on AgiBot robot data. Part of the [DreamDojo](https://github.com/NVIDIA/DreamDojo) model family. | | | |---|---| | **Size** | 2B | | **Stage** | Post-training | | **Architecture** | DiT (Diffusion Transformer) with AdaLN-LoRA | | **Base** | Cosmos Predict 2.5 | ## Checkpoint Structure ``` DreamDojo-AgiBot-2B-Diffusers/ ├── transformer/ # DiT backbone (sharded safetensors) ├── crossattn_adapter/ # Text-to-DiT projection (100352 → 1024) ├── vae/ # AutoencoderKLWan (standard diffusers) ├── lam/ # Latent Action Model (710M params) ├── text_encoder/ # Cosmos-Reason1-7B ├── scheduler/ # FlowMatchEulerDiscreteScheduler ├── action_processor/ # DreamDojo-specific config └── config.json ``` ## Architecture | | 2B | |--|------| | Model channels | 2048 | | Transformer blocks | 28 | | Attention heads | 16 | | Patch size (spatial / temporal) | 2 / 1 | | Action dim | 384 (unified) | ## Citation ```bibtex @article{dreamdojo2025, title={DreamDojo: Advancing Real-World Robot Policies Through Generated Interactive Environments}, author={NVIDIA}, year={2025} } ``` ## License Please refer to the [NVIDIA DreamDojo](https://github.com/NVIDIA/DreamDojo) repository for license terms.