Video-to-Video
Diffusers
Safetensors
robotics
video-generation
diffusion
action-conditioned
dreamdojo
cosmos-predict2.5
Instructions to use Physis-AI/DreamDojo-AgiBot-2B-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Physis-AI/DreamDojo-AgiBot-2B-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Physis-AI/DreamDojo-AgiBot-2B-Diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: other | |
| license_name: nvidia-open-model-license | |
| license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/ | |
| tags: | |
| - robotics | |
| - video-generation | |
| - diffusion | |
| - action-conditioned | |
| - dreamdojo | |
| - cosmos-predict2.5 | |
| library_name: diffusers | |
| pipeline_tag: video-to-video | |
| # DreamDojo-AgiBot-2B-Diffusers | |
| Fine-tuned on AgiBot robot data. Part of the [DreamDojo](https://github.com/NVIDIA/DreamDojo) model family. | |
| | | | | |
| |---|---| | |
| | **Size** | 2B | | |
| | **Stage** | Post-training | | |
| | **Architecture** | DiT (Diffusion Transformer) with AdaLN-LoRA | | |
| | **Base** | Cosmos Predict 2.5 | | |
| ## Checkpoint Structure | |
| ``` | |
| DreamDojo-AgiBot-2B-Diffusers/ | |
| βββ transformer/ # DiT backbone (sharded safetensors) | |
| βββ crossattn_adapter/ # Text-to-DiT projection (100352 β 1024) | |
| βββ vae/ # AutoencoderKLWan (standard diffusers) | |
| βββ lam/ # Latent Action Model (710M params) | |
| βββ text_encoder/ # Cosmos-Reason1-7B | |
| βββ scheduler/ # FlowMatchEulerDiscreteScheduler | |
| βββ action_processor/ # DreamDojo-specific config | |
| βββ config.json | |
| ``` | |
| ## Architecture | |
| | | 2B | | |
| |--|------| | |
| | Model channels | 2048 | | |
| | Transformer blocks | 28 | | |
| | Attention heads | 16 | | |
| | Patch size (spatial / temporal) | 2 / 1 | | |
| | Action dim | 384 (unified) | | |
| ## Citation | |
| ```bibtex | |
| @article{dreamdojo2025, | |
| title={DreamDojo: Advancing Real-World Robot Policies Through Generated Interactive Environments}, | |
| author={NVIDIA}, | |
| year={2025} | |
| } | |
| ``` | |
| ## License | |
| Please refer to the [NVIDIA DreamDojo](https://github.com/NVIDIA/DreamDojo) repository for license terms. | |