You need to agree to share your contact information to access this model
By clicking "Agree and Access" you acknowledge the Privacy Policy and consent to receive offers and updates. You can unsubscribe at any time.
Log in or Sign Up to review the conditions and access this model content.
LTX-2.3 22B IC-LoRA Water Simulation
This is a Water Simulation IC-LoRA trained on top of LTX-2.3-22B, which adds water to a clip β rivers, surf, rain, waterfalls, floods, splashes, spray, and wet-surface specularities β while keeping the subject's identity, clothing, pose, camera framing, and background geometry identical to the reference.
It is based on the LTX-2.3 foundation model.
Model Files
ltx-2.3-22b-ic-lora-water-simulation-0.9.safetensors β the released checkpoint (training step 3000). Trained on the -dev base; recommended inference on the -distilled base.
Model Details
- Base Model: LTX-2.3-22B Video
- Training Type: IC-LoRA (video-to-video, reference-conditioned)
- Control Type: Reference video conditioning β a "dry" reference clip drives a re-render of the same shot with water added.
- Reference Downscale Factor: 1 (the reference is processed at the same resolution as the output).
- Pipeline details: No special pre/post-processing. The dry reference video is VAE-encoded as the control signal and the model predicts the result with water added.
Intended Use & Out-of-Scope
Intended use: Adding believable, naturally-moving water to a live-action shot from a single prompt β water that interacts with the moving elements in the scene (flowing around legs, surging over surfaces, splashing, foaming, reflecting, and wetting what it touches) while the camera move, subjects, and framing stay exactly as shot.
Out of scope: Trained on real-water footage β it generalizes to other liquids (lava, slime, paint, etc.) only loosely, and results vary. Very high strengths (β₯ 1.5) maximize water drama but can warp faces and fine detail.
Control Signal Requirements
- Control signal type: A "dry" reference video (the "before" clip, without water).
- Expected input: A single reference video at 24 fps.
- Preprocessing: None required β the dry reference is VAE-encoded directly. The reference is used at 1Γ the output resolution (downscale factor 1).
- Alignment: The reference must be 24 fps with at least
Fframes; it conditions on the firstFframes with no resampling. - Mask support: Not supported β the effect is applied to the whole frame.
How It Works
The model is conditioned on both the dry reference video latents and a text prompt written in the dual-panel "Reference / Edited" structure it was trained on. Include the literal trigger ADD WATER in every prompt:
Reference shows <the dry scene>. Edited shows the same scene with water added.
ADD WATER <vivid description of the water: type, motion, how it interacts with the subject>.
Subject identity, clothing, framing, and background geometry are identical to the
reference; only water-related elements differ between reference and edited.
Be concrete about the water (e.g. "a clear shallow stream braiding around their legs with white foam crests", "a colossal tsunami wall of churning brown water"). The amount and drama of water scales with both the wording and the LoRA strength.
Usage
π ComfyUI
- Copy the LoRA weights into
models/loras. - Load the LTX-2.3-22B base model and add
ltx-2.3-22b-ic-lora-water-simulation-0.9.safetensorsas the LoRA. - Start at strength
1.0and adjust to taste. - Use an IC-LoRA (video-to-video) workflow from the LTX-2 ComfyUI repository, which already wires the reference-video control nodes. Connect your dry clip as the reference video and include the
ADD WATERtrigger in the prompt.
Recommended Settings
These are the settings that produced the best results. LoRA strength β 1.2 is the sweet spot.
| Setting | Recommended value |
|---|---|
| LoRA strength | 1.2 (best all-round) |
| Base checkpoint | ltx-2.3-22b-distilled |
| Sampler | Distilled stage-1 at native resolution (identity-safe), 8 fixed sigmas |
| CFG / guidance | 1.0 (distilled path; no negative prompt, no STG) |
| Resolution | 1920Γ1088 (landscape) / 1088Γ1920 (portrait), 24 fps |
Frames F |
must satisfy (F-1) % 8 == 0 β e.g. 121, 153, 169, 185 (~7.7 s max) |
| Reference clip | the dry "before" clip, 24 fps, β₯ F frames (conditions on the first F frames, no resampling) |
| Seed | any (1212 used for the published samples) |
Strength guide
0.9β too weak, water under-renders.1.0β1.05β recommended; natural, identity-safe water.1.1-1.5β sweet spot when you need hard surface replacement (ground β sea).1.25 / 1.45β maximum drama (tsunamis, full floods); higher risk of identity drift.
Important β use stage-1-only at native resolution. The two-stage distilled pipeline applies the reference conditioning only in stage 1; the stage-2 upscaler then refines from the prompt alone and drifts the subject's identity (or, if you force the dry reference into stage 2, erases the added water). Generating stage-1 at the native output resolution keeps both identity and the water effect, and matches the training regime. Lowering strength does not fix stage-2 drift β the cause is structural.
Alternative: trainer / validation regime (dev base). If you render on the -dev base with the trainer sampler instead of the distilled path, the validation regime was: guidance_scale 2.5, num_inference_steps 20, stg_mode stg_v, stg_blocks [29], stg_scale 1.0, at 960Γ544, with the negative prompt:
worst quality, inconsistent motion, blurry, jittery, distorted, deformed water,
unnatural fluid, frozen water, morphing face, shifting features, inconsistent clothing,
warped features
References
- Code: GitHub Repository
- ComfyUI: ComfyUI-LTXVideo
- IC-LoRA docs: IC-LoRA usage guide
Tips & Troubleshooting
- Identity drift at high res: keep the stage-1-only native recipe (do not use the two-stage path for identity-critical clips) so the reference stays attached for the entire denoise.
- Faces or fine detail warping: lower the LoRA strength; very high strengths (β₯ 1.5) maximize water drama but can warp faces and fine detail.
- Other liquids: trained on real water, it generalizes to other liquids (lava, slime, paint, etc.) only loosely; results vary.
- Reference too short: the reference must be 24 fps with at least
Fframes; it conditions on the firstFframes with no resampling.
Dataset
~572 paired examples (β511 water-add edits + 61 identity / no-op pairs) covering a wide range of real water events β rain, rivers, surf, waterfalls, floods, fountains, splashes, and spray β across both landscape and portrait shots.
Each pair is (dry reference β wet target):
- target = the original clip with water (what the model should produce),
- reference = the same clip with all water digitally removed (the conditioning the model sees), frame-matched to the target for exact parity.
Clips are 1920Γ1080 / 1080Γ1920, 24 fps, 96/120 frames, captioned in the dual-panel Reference showsβ¦ Edited showsβ¦ ADD WATERβ¦ format. The 61 identity pairs (reference == target, "no water added") teach the model to leave a scene untouched and to preserve subject identity.
Training
- Technique: IC-LoRA (reference-conditioned; rank 128, alpha 128, dropout 0.05) on the DiT transformer; target modules
attn1.to_q,attn1.to_k,attn1.to_v,attn1.to_out.0,ff.net.0.proj,ff.net.2. - Hyperparameters: bf16 mixed precision, AdamW, learning rate 1.5e-4, cosine schedule, max grad norm 1.0, gradient checkpointing on, no quantization. Strategy
flexibleβ reference conditioningp=1.0, first-frame conditioningp=0.15. Batch size 1/GPU Γ 8 GPUs (global 8). Trainable params ~453M. - Resolution / data: 960Γ544, 97β121 frames @ 24 fps; ~572 (dry β wet) pairs.
- Steps: 3,000 (recommended checkpoint: step 3000).
- Infrastructure: LTX-2 Community Trainer, 8Γ NVIDIA H100, ~4.5 h.
License
See the LTX-2-community-license for full terms.
Acknowledgments
- Base model by Lightricks
- Training infrastructure: LTX-2 Community Trainer
Model tree for Lightricks/LTX-2.3-22b-IC-LoRA-Water-Simulation
Base model
Lightricks/LTX-2.3