---
base_model: Wan-AI/Wan2.2-TI2V-5B
license: apache-2.0
pipeline_tag: image-to-video
---

# PhyRAG Wan2.2 TI2V-5B

> Authors: Kexu Cheng, Zicheng Liu, Mingju Gao, Chunhe Song, Hao Tang  
> Project page: https://sediment1024.github.io/PhysRAG/ 
> Paper: [PhysRAG: Enhancing Physics-Awareness in Video Generation via Retrieval-Augmented Generation](https://huggingface.co/papers/2606.26916)  
> Code: https://github.com/sediment1024/PhysRAG  
> Dataset: https://huggingface.co/datasets/sediment1024/PhysRAG  

This repository contains the physical-injection checkpoint used by PhyRAG,
built on top of Wan2.2 TI2V-5B. The base Wan2.2 checkpoint is not included.

## Configuration

- 49 frames at 704 x 480 (width x height)
- Physical injection at DiT blocks 0, 1, and 2
- 128 learnable queries
- Adapter dimension 16
- VideoCLIP-XL retrieval over a 170-video physical reference library
- VideoMAE-V2 features cached offline

## Checkpoint loading

`merged_model.pt` is the rank-0 sparse state dict produced by the original
DeepSpeed ZeRO-3 training run. Empty partition tensors are intentionally skipped
by the PhyRAG checkpoint loader. Use the loader included in the companion code
repository; loading this file directly with strict `load_state_dict` is not
supported.

The SHA-256 checksum is
`ae60ae88911560b48b1172e3302586b07a6da1f70fcea32229cacddcb702321d`.

## Required assets

1. Wan2.2 TI2V-5B base model
2. PhyRAG 170-video RAG library and FAISS index
3. VideoCLIP-XL retriever checkpoint

## Citation

```bibtex
@misc{cheng2026physragenhancingphysicsawarenessvideo,
      title={PhysRAG: Enhancing Physics-Awareness in Video Generation via Retrieval-Augmented Generation}, 
      author={Kexu Cheng and Zicheng Liu and Mingju Gao and Chunhe Song and Hao Tang},
      year={2026},
      eprint={2606.26916},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2606.26916}, 
}
```