taited
/

ReImagine-Pretrained

Model card Files Files and versions

ReImagine-Pretrained / README.md

taited's picture

Add model card for ReImagine (#1)

8e1547f about 1 month ago

|

history blame contribute delete

2.64 kB

	---
	license: apache-2.0
	pipeline_tag: image-to-video
	---

	# ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis

	[Project Page](https://keruzheng.github.io/ReImagine-Project/) \| [Paper (arXiv)](https://arxiv.org/abs/2604.19720) \| [Code](https://github.com/Taited/ReImagine) \| [Demo](https://taited-reimagine.hf.space/)

	ReImagine is a framework for controllable high-quality human video generation. It revisits the problem from an image-first perspective, where high-quality human appearance is learned via image generation and used as a prior for video synthesis. This approach decouples appearance modeling from temporal consistency.

	The system utilizes a pose- and viewpoint-controllable pipeline that combines a pretrained image backbone with SMPL-X-based motion guidance, followed by a training-free temporal refinement stage based on a pretrained video diffusion model.

	## Getting Started

	### Installation

	```bash
	conda create -n reimagine python=3.10
	conda activate reimagine
	pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu124
	pip install -e .
	```

	### Pretrained Weights

	ReImagine utilizes base models and specific LoRA weights. You can download the weights using the Hugging Face CLI:

	```bash
	# Download base FLUX.1 model
	hf download black-forest-labs/FLUX.1-Kontext-dev \
	--local-dir ./models/FLUX.1-Kontext-dev \
	--exclude "flux1-kontext-dev.safetensors" \
	--exclude "vae/**"

	# Download ControlNet
	hf download jasperai/Flux.1-dev-Controlnet-Surface-Normals \
	--local-dir ./models/Flux.1-dev-Controlnet-Surface-Normals

	# Download ReImagine LoRA Weights
	hf download taited/ReImagine-Pretrained --local-dir ./models/ReImagine-Pretrained
	```

	## Inference

	To perform image-first synthesis, use the provided inference script:

	```bash
	python inference_img.py
	```
	This script requires a wide reference image (front and back views) and a normal map generated from SMPL-X. For video synthesis, the temporal-refinement stage is used to ensure consistency across frames.

	## Citation

	If you find this project useful, please consider citing the paper:

	```bibtex
	@article{sun2025rethinking,
	title={ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis},
	author={Sun, Zhengwentai and Zheng, Keru and Li, Chenghong and Liao, Hongjie and Yang, Xihe and Li, Heyuan and Zhi, Yihao and Ning, Shuliang and Cui, Shuguang and Han, Xiaoguang},
	journal={arXiv preprint arXiv:2604.19720},
	year={2026},
	url={https://arxiv.org/abs/2604.19720v1}
	}
	```