Physis-AI
/

DreamDojo-AgiBot-2B-Diffusers

video-generation

action-conditioned

cosmos-predict2.5

Model card Files Files and versions

DreamDojo-AgiBot-2B-Diffusers / README.md

XuyaoWang's picture

Upload folder using huggingface_hub

26de0fc verified 29 days ago

|

history blame contribute delete

1.7 kB

	---
	license: other
	license_name: nvidia-open-model-license
	license_link: https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/
	tags:
	- robotics
	- video-generation
	- diffusion
	- action-conditioned
	- dreamdojo
	- cosmos-predict2.5
	library_name: diffusers
	pipeline_tag: video-to-video
	---

	# DreamDojo-AgiBot-2B-Diffusers

	Fine-tuned on AgiBot robot data. Part of the [DreamDojo](https://github.com/NVIDIA/DreamDojo) model family.

	\| \| \|
	\|---\|---\|
	\| Size \| 2B \|
	\| Stage \| Post-training \|
	\| Architecture \| DiT (Diffusion Transformer) with AdaLN-LoRA \|
	\| Base \| Cosmos Predict 2.5 \|

	## Checkpoint Structure

	```
	DreamDojo-AgiBot-2B-Diffusers/
	├── transformer/ # DiT backbone (sharded safetensors)
	├── crossattn_adapter/ # Text-to-DiT projection (100352 → 1024)
	├── vae/ # AutoencoderKLWan (standard diffusers)
	├── lam/ # Latent Action Model (710M params)
	├── text_encoder/ # Cosmos-Reason1-7B
	├── scheduler/ # FlowMatchEulerDiscreteScheduler
	├── action_processor/ # DreamDojo-specific config
	└── config.json
	```

	## Architecture

	\| \| 2B \|
	\|--\|------\|
	\| Model channels \| 2048 \|
	\| Transformer blocks \| 28 \|
	\| Attention heads \| 16 \|
	\| Patch size (spatial / temporal) \| 2 / 1 \|
	\| Action dim \| 384 (unified) \|

	## Citation

	```bibtex
	@article{dreamdojo2025,
	title={DreamDojo: Advancing Real-World Robot Policies Through Generated Interactive Environments},
	author={NVIDIA},
	year={2025}
	}
	```

	## License

	Please refer to the [NVIDIA DreamDojo](https://github.com/NVIDIA/DreamDojo) repository for license terms.