| --- |
| language: |
| - en |
| pipeline_tag: text-to-video |
| tags: |
| - video-generation |
| - world-model |
| - pytorch |
| - dit |
| library_name: pytorch |
| --- |
| |
| # HyDRA: Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models |
|
|
| This is the official Hugging Face model repository for **HyDRA** (Hybrid Memory for Dynamic Video World Models). |
|
|
| π **GitHub Repository:** [H-EmbodVis/HyDRA](https://github.com/H-EmbodVis/HyDRA) |
| π **Project Page:** [Hybrid-Memory-in-Video-World-Models](https://kj-chen666.github.io/Hybrid-Memory-in-Video-World-Models/) |
|
|
| ## π Overview |
|
|
| While recent video world models excel at simulating static environments, they share a critical blind spot: the physical world is dynamic. When moving subjects exit the camera's field of view and later re-emerge, current models often lose track of them. |
|
|
| To bridge this gap, we introduce **Hybrid Memory**, a novel paradigm that requires models to simultaneously act as precise archivists for static backgrounds and vigilant trackers for dynamic subjects. **HyDRA** is a specialized memory architecture that compresses contexts into memory tokens and utilizes a spatiotemporal relevance-driven retrieval mechanism. |
|
|
| ## π― Task & Capabilities |
| - **Task:** Text-to-Video Generation / Video World Modeling |
| - **Input:** Text prompts, camera poses, and initial video latents. |
| - **Output:** High-fidelity video sequences maintaining both identity and motion continuity of dynamic subjects, even during out-of-view intervals. |
|
|
| ## π Usage |
|
|
| To use these weights, please refer to our GitHub repository: [H-EmbodVis/HyDRA](https://github.com/H-EmbodVis/HyDRA) |
|
|
|
|
| ## π Citation |
| If you find our work useful, please consider citing: |
|
|
| ```bibtex |
| @article{chen2026out, |
| title = {Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models}, |
| author = {Chen, Kaijin and Liang, Dingkang and Zhou, Xin and Ding, Yikang and Liu, Xiaoqiang and Wan, Pengfei and Bai, Xiang}, |
| journal = {arXiv preprint arXiv:2603.25716}, |
| year = {2026} |
| } |