Add model card for HyDRA
#1
by nielsr HF Staff - opened
README.md
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: text-to-video
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
# HyDRA: Out of Sight but Not Out of Mind
|
| 7 |
+
|
| 8 |
+
This repository contains the weights for **HyDRA** (Hybrid Memory for Dynamic Video World Models), as presented in the paper [Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models](https://arxiv.org/abs/2603.25716).
|
| 9 |
+
|
| 10 |
+
HyDRA is a novel memory architecture that enables video world models to simultaneously act as precise archivists for static backgrounds and vigilant trackers for dynamic subjects. This ensures visual and motion continuity even when subjects temporarily move out of the camera's field of view.
|
| 11 |
+
|
| 12 |
+
[**Project Page**](https://kj-chen666.github.io/Hybrid-Memory-in-Video-World-Models/) | [**GitHub**](https://github.com/H-EmbodVis/HyDRA)
|
| 13 |
+
|
| 14 |
+
## Installation
|
| 15 |
+
|
| 16 |
+
To set up the environment, clone the repository and install the dependencies:
|
| 17 |
+
|
| 18 |
+
```bash
|
| 19 |
+
git clone https://github.com/H-EmbodVis/HyDRA.git
|
| 20 |
+
cd HyDRA
|
| 21 |
+
conda create -n hydra python=3.10 -y
|
| 22 |
+
conda activate hydra
|
| 23 |
+
pip install -r requirements.txt
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
## Inference
|
| 27 |
+
|
| 28 |
+
HyDRA is built upon the **Wan2.1 (1.3B) T2V** model. Ensure you have downloaded the required weights into the `./ckpts` directory as described in the official repository.
|
| 29 |
+
|
| 30 |
+
You can run inference on example data using the following command:
|
| 31 |
+
|
| 32 |
+
```bash
|
| 33 |
+
python infer_hydra.py
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## Training
|
| 37 |
+
|
| 38 |
+
The model can be trained on custom datasets using the provided training script:
|
| 39 |
+
|
| 40 |
+
```bash
|
| 41 |
+
python train_hydra.py \
|
| 42 |
+
--dit_path ./ckpts/Wan2.1-T2V-1.3B/diffusion_pytorch_model.safetensors \
|
| 43 |
+
--use_gradient_checkpointing \
|
| 44 |
+
--hydra
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
## Citation
|
| 48 |
+
|
| 49 |
+
If you find this work useful, please consider citing:
|
| 50 |
+
|
| 51 |
+
```bibtex
|
| 52 |
+
@article{chen2026out,
|
| 53 |
+
title = {Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models},
|
| 54 |
+
author = {Chen, Kaijin and Liang, Dingkang and Zhou, Xin and Ding, Yikang and Liu, Xiaoqiang and Wan, Pengfei and Bai, Xiang},
|
| 55 |
+
journal = {arXiv preprint arXiv:2603.25716},
|
| 56 |
+
year = {2026}
|
| 57 |
+
}
|
| 58 |
+
```
|