--- base_model: Wan-AI/Wan2.2-TI2V-5B license: apache-2.0 pipeline_tag: image-to-video --- # PhyRAG Wan2.2 TI2V-5B > Authors: Kexu Cheng, Zicheng Liu, Mingju Gao, Chunhe Song, Hao Tang > Project page: https://sediment1024.github.io/PhysRAG/ > Paper: [PhysRAG: Enhancing Physics-Awareness in Video Generation via Retrieval-Augmented Generation](https://huggingface.co/papers/2606.26916) > Code: https://github.com/sediment1024/PhysRAG > Dataset: https://huggingface.co/datasets/sediment1024/PhysRAG This repository contains the physical-injection checkpoint used by PhyRAG, built on top of Wan2.2 TI2V-5B. The base Wan2.2 checkpoint is not included. ## Configuration - 49 frames at 704 x 480 (width x height) - Physical injection at DiT blocks 0, 1, and 2 - 128 learnable queries - Adapter dimension 16 - VideoCLIP-XL retrieval over a 170-video physical reference library - VideoMAE-V2 features cached offline ## Checkpoint loading `merged_model.pt` is the rank-0 sparse state dict produced by the original DeepSpeed ZeRO-3 training run. Empty partition tensors are intentionally skipped by the PhyRAG checkpoint loader. Use the loader included in the companion code repository; loading this file directly with strict `load_state_dict` is not supported. The SHA-256 checksum is `ae60ae88911560b48b1172e3302586b07a6da1f70fcea32229cacddcb702321d`. ## Required assets 1. Wan2.2 TI2V-5B base model 2. PhyRAG 170-video RAG library and FAISS index 3. VideoCLIP-XL retriever checkpoint ## Citation ```bibtex @misc{cheng2026physragenhancingphysicsawarenessvideo, title={PhysRAG: Enhancing Physics-Awareness in Video Generation via Retrieval-Augmented Generation}, author={Kexu Cheng and Zicheng Liu and Mingju Gao and Chunhe Song and Hao Tang}, year={2026}, eprint={2606.26916}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2606.26916}, } ```