license: apache-2.0
pipeline_tag: text-to-video
HyDRA: Out of Sight but Not Out of Mind
This repository contains the weights for HyDRA (Hybrid Memory for Dynamic Video World Models), as presented in the paper Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models.
HyDRA is a novel memory architecture that enables video world models to simultaneously act as precise archivists for static backgrounds and vigilant trackers for dynamic subjects. This ensures visual and motion continuity even when subjects temporarily move out of the camera's field of view.
Installation
To set up the environment, clone the repository and install the dependencies:
git clone https://github.com/H-EmbodVis/HyDRA.git
cd HyDRA
conda create -n hydra python=3.10 -y
conda activate hydra
pip install -r requirements.txt
Inference
HyDRA is built upon the Wan2.1 (1.3B) T2V model. Ensure you have downloaded the required weights into the ./ckpts directory as described in the official repository.
You can run inference on example data using the following command:
python infer_hydra.py
Training
The model can be trained on custom datasets using the provided training script:
python train_hydra.py \
--dit_path ./ckpts/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors \
--use_gradient_checkpointing \
--hydra
Citation
If you find this work useful, please consider citing:
@article{chen2026out,
title = {Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models},
author = {Chen, Kaijin and Liang, Dingkang and Zhou, Xin and Ding, Yikang and Liu, Xiaoqiang and Wan, Pengfei and Bai, Xiang},
journal = {arXiv preprint arXiv:2603.25716},
year = {2026}
}