Any-to-Any
English
ltx-video
image-to-video
text-to-video

How to configure depth passes when bypassing DepthCrafter?

#1
by Voltbrah - opened

First of all thank you for all the open source contributions you guys keep making.

I'm pretty new to this and was wondering if anyone can give me some pointers on how to use the IC-LoRA-Union-Control with pre-rendered depth passes. I'm using ReShade to capture DisplayDepth directly, which I would imagine should produce more accurate (and faster) results than having DepthCrafter estimate it from a regular screencapture.

What I'm a bit lost on is how I should configure my depth passes to get the depth_maps input that the LoRA expects and how I should hook this up in the workflow.

Any help is appreciated, thanks!

Hey! Thanks for the kind words, really means a lot to us.

yes β€” using pre-rendered depth passes like ReShade DisplayDepth is totally valid.

I made a simplified workflow for you (attached). Here's what changed and what you need to know:

What I did: Removed the preprocessing section entirely (DepthCrafter, Canny, Pose nodes) and connected your depth video directly to the node that injects the reference into the model. That's it.

One important constraint: Your depth video's width and height must both be divisible by 64 (Reference Downscale Factor * 32).
Here's why: the generated video will be the same resolution as your reference video, but when we feed the reference into the model internally, we use it at half that size. Since our model requires all video dimensions to be multiples of 32, the reference video needs to be divisible by 64 β€” so that when it's halved, it's still divisible by 32.

Don't worry about calculating this manually though β€” the workflow handles it automatically. You can load any video and the resize nodes will scale it to the correct resolution for you.

Hope that helps β€” let me know if you run into anything!
LTX-2.3_ICLoRA_Union_Control_Distilled_without_preprocessing

Sign up or log in to comment