XuyaoWang's picture
Upload folder using huggingface_hub
26de0fc verified
metadata
license: other
license_name: nvidia-open-model-license
license_link: >-
  https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
tags:
  - robotics
  - video-generation
  - diffusion
  - action-conditioned
  - dreamdojo
  - cosmos-predict2.5
library_name: diffusers
pipeline_tag: video-to-video

DreamDojo-AgiBot-2B-Diffusers

Fine-tuned on AgiBot robot data. Part of the DreamDojo model family.

Size 2B
Stage Post-training
Architecture DiT (Diffusion Transformer) with AdaLN-LoRA
Base Cosmos Predict 2.5

Checkpoint Structure

DreamDojo-AgiBot-2B-Diffusers/
β”œβ”€β”€ transformer/            # DiT backbone (sharded safetensors)
β”œβ”€β”€ crossattn_adapter/      # Text-to-DiT projection (100352 β†’ 1024)
β”œβ”€β”€ vae/                    # AutoencoderKLWan (standard diffusers)
β”œβ”€β”€ lam/                    # Latent Action Model (710M params)
β”œβ”€β”€ text_encoder/           # Cosmos-Reason1-7B
β”œβ”€β”€ scheduler/              # FlowMatchEulerDiscreteScheduler
β”œβ”€β”€ action_processor/       # DreamDojo-specific config
└── config.json

Architecture

2B
Model channels 2048
Transformer blocks 28
Attention heads 16
Patch size (spatial / temporal) 2 / 1
Action dim 384 (unified)

Citation

@article{dreamdojo2025,
  title={DreamDojo: Advancing Real-World Robot Policies Through Generated Interactive Environments},
  author={NVIDIA},
  year={2025}
}

License

Please refer to the NVIDIA DreamDojo repository for license terms.