Video-to-Video
Diffusers
Safetensors
robotics
video-generation
diffusion
action-conditioned
dreamdojo
cosmos-predict2.5
Instructions to use Physis-AI/DreamDojo-AgiBot-2B-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Physis-AI/DreamDojo-AgiBot-2B-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Physis-AI/DreamDojo-AgiBot-2B-Diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
File size: 1,702 Bytes
26de0fc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | ---
license: other
license_name: nvidia-open-model-license
license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
tags:
- robotics
- video-generation
- diffusion
- action-conditioned
- dreamdojo
- cosmos-predict2.5
library_name: diffusers
pipeline_tag: video-to-video
---
# DreamDojo-AgiBot-2B-Diffusers
Fine-tuned on AgiBot robot data. Part of the [DreamDojo](https://github.com/NVIDIA/DreamDojo) model family.
| | |
|---|---|
| **Size** | 2B |
| **Stage** | Post-training |
| **Architecture** | DiT (Diffusion Transformer) with AdaLN-LoRA |
| **Base** | Cosmos Predict 2.5 |
## Checkpoint Structure
```
DreamDojo-AgiBot-2B-Diffusers/
βββ transformer/ # DiT backbone (sharded safetensors)
βββ crossattn_adapter/ # Text-to-DiT projection (100352 β 1024)
βββ vae/ # AutoencoderKLWan (standard diffusers)
βββ lam/ # Latent Action Model (710M params)
βββ text_encoder/ # Cosmos-Reason1-7B
βββ scheduler/ # FlowMatchEulerDiscreteScheduler
βββ action_processor/ # DreamDojo-specific config
βββ config.json
```
## Architecture
| | 2B |
|--|------|
| Model channels | 2048 |
| Transformer blocks | 28 |
| Attention heads | 16 |
| Patch size (spatial / temporal) | 2 / 1 |
| Action dim | 384 (unified) |
## Citation
```bibtex
@article{dreamdojo2025,
title={DreamDojo: Advancing Real-World Robot Policies Through Generated Interactive Environments},
author={NVIDIA},
year={2025}
}
```
## License
Please refer to the [NVIDIA DreamDojo](https://github.com/NVIDIA/DreamDojo) repository for license terms.
|