--- license: apache-2.0 tags: - comfyui - ltx-video - ltx-2.3 - foley - video-to-audio - audio-generation - workflow - foley-lora --- # LTX 2.3 Foley V2A ComfyUI Workflow This repository contains ready-to-test ComfyUI workflows for the [`FuzzPuppy/LTX-2.3-Foley-LoRA`](https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-LoRA) LoRA. The LoRA adds Foley sound effects to a silent input video using LTX-2.3: given a video and a prompt describing the visible action, the loop workflow generates matching non-speech, non-music sound effects and saves a new MP4. There are two workflows provided: 1. `foley-sliding-window.json`: long-video workflow with overlapping audio windows and stitching. 2. `ltx_23_foley_v2a.json`: original short-clip workflow. If you want run a quick short test, use `ltx_23_foley_v2a.json`. Otherwise, use `foley-sliding-window.json` so you can generate longer audio while keeping memory under control. ## Tutorial [![Watch the tutorial: using the LTX-2.3 Foley LoRA in ComfyUI](https://img.youtube.com/vi/qnHFDlrySmw/hqdefault.jpg)](https://youtu.be/qnHFDlrySmw) [Watch the tutorial on YouTube](https://youtu.be/qnHFDlrySmw) ## What Is Included - `foley-sliding-window.json`: long-video workflow with overlapping audio windows and stitching. - `ltx_23_foley_v2a.json`: original short-clip workflow. - `setup_runpod_ltx_foley.sh`: one-command RunPod setup script. - `ltx_foley_v2a`: small helper-node package. - `tennis-no-sound.mp4`: default silent test video for RunPod setup. Both workflows require the `ltx_foley_v2a` helper-node package. If ComfyUI shows missing nodes named `LTXFoleyForLoopOpen`, `LTXFoleyWindowSelect`, `LTXFoleyVideoToAudioLatent`, or `LTXFoleyAudioVAEDecode`, the workflow JSON was loaded before these helper nodes were installed into `ComfyUI/custom_nodes`. The helper-node package handles the workflow-specific pieces that stock ComfyUI does not currently cover cleanly: - plans the window count from the uploaded video - provides a small local ComfyUI for-loop so no external loop-node pack is needed - splits longer videos into overlapping windows - freezes each source window as LTX video latents while leaving matching audio latents empty for Foley generation - decodes each audio window into the Comfy audio tensor layout expected by current video saving nodes - writes each raw decoded window as a WAV before stitching so artifacts can be checked before the final crossfade - crossfades and stitches generated audio windows into one final track Prompt text, model loading, LoRA loading, video creation, and MP4 saving use normal ComfyUI/LTXVideo nodes. ## Fastest RunPod Test Use the official RunPod **ComfyUI - CUDA 12.8** template: https://console.runpod.io/deploy?template=cw3nka7d08&ref=k7b1cgii 1. In RunPod, under "Additional Filters" filter CUDA versions to CUDA 12.8. 2. Select a 48 GB GPU: A40, RTX A6000, L40/L40S, or A100. 3. Make sure the `ComfyUI - CUDA 12.8` template is selected. 4. The template's default volume disk is `50 GB`, which is enough for the core workflow files, but tight once caches and reruns accumulate. Change the volume disk to `100 GB` if you want more breathing room. 5. Start the pod and open a terminal. 6. Run: ```bash cd /workspace curl -L https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-Workflow/resolve/main/setup_runpod_ltx_foley.sh -o setup_runpod_ltx_foley.sh bash setup_runpod_ltx_foley.sh ``` The setup script installs the nodes and models, downloads the tennis test video as `input.mp4`, restarts ComfyUI without stopping the pod (with `--cache-classic`, see the Manual ComfyUI Install notes), and waits until the UI responds on port `8188`. By default the script installs ComfyUI `v0.27.0`. To test another ComfyUI release, set `COMFYUI_CORE_REF`: ```bash COMFYUI_CORE_REF=v0.19.0 bash setup_runpod_ltx_foley.sh ``` To install workflow files from a different Hugging Face branch, set `WORKFLOW_REVISION`: ```bash WORKFLOW_REVISION=windows bash setup_runpod_ltx_foley.sh ``` After the script finishes: 1. Open ComfyUI from the RunPod web UI. 2. Under workflows, select `foley-sliding-window.json`. 3. Hit `Run`. The default input video and prompt are already set: ```text Two men are playing tennis. No speech is present. No music is present. ``` ## What The Script Installs The script assumes the official CUDA 12.8 template layout from `runpod-workers/comfyui-base`: - ComfyUI: `/workspace/runpod-slim/ComfyUI` - Python environment: `/workspace/runpod-slim/ComfyUI/.venv-cu128` - ComfyUI port: `8188` It installs or refreshes: - `Lightricks/ComfyUI-LTXVideo` - `ltx_foley_v2a` helper nodes - `foley-sliding-window.json` - `ltx_23_foley_v2a.json` - `tennis-no-sound.mp4` The script also applies a small compatibility patch to the installed `ComfyUI-LTXVideo/pyramid_blending.py` file so current Kornia builds can import the node pack on fresh ComfyUI installs. It downloads these model files: - Base checkpoint: [`Lightricks/LTX-2.3-fp8/ltx-2.3-22b-dev-fp8.safetensors`](https://huggingface.co/Lightricks/LTX-2.3-fp8/blob/main/ltx-2.3-22b-dev-fp8.safetensors) - Text encoder: [`Comfy-Org/ltx-2/split_files/text_encoders/gemma_3_12B_it_fp8_scaled.safetensors`](https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it_fp8_scaled.safetensors) - Foley LoRA: [`FuzzPuppy/LTX-2.3-Foley-LoRA/ltx-2.3-foley-400-steps.safetensors`](https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-LoRA/blob/main/ltx-2.3-foley-400-steps.safetensors) Large model downloads are SHA-256 verified. Completed files are skipped on rerun, interrupted downloads resume from `*.part` files, and corrupt partials are retried once from scratch. ## Manual ComfyUI Install If you are not using the RunPod script: 1. Install or update ComfyUI. 2. Install the official LTXVideo custom nodes: `https://github.com/Lightricks/ComfyUI-LTXVideo` 3. Install the Foley helper nodes by placing the workflow repo's `ltx_foley_v2a` folder into: `ComfyUI/custom_nodes/` 4. Copy the either `foley-sliding-window.json` or `ltx_23_foley_v2a.json` into your ComfyUI user workflows folder. In a standard ComfyUI install this is: `ComfyUI/user/default/workflows`. 5. Put the model files in: - checkpoint: [`ltx-2.3-22b-dev-fp8.safetensors`](https://huggingface.co/Lightricks/LTX-2.3-fp8/blob/main/ltx-2.3-22b-dev-fp8.safetensors) in `ComfyUI/models/checkpoints` - text encoder: [`gemma_3_12B_it_fp8_scaled.safetensors`](https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it_fp8_scaled.safetensors) in `ComfyUI/models/text_encoders` - Foley LoRA: [`ltx-2.3-foley-400-steps.safetensors`](https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-LoRA/blob/main/ltx-2.3-foley-400-steps.safetensors) in `ComfyUI/models/loras` 6. Restart ComfyUI, starting it with the `--cache-classic` flag: ```bash python main.py --cache-classic ``` On newer ComfyUI versions (`v0.27.0`+) the default caching mode is RAM-pressure caching, which can evict node outputs in the middle of a run while the large LTX models load. For `foley-sliding-window.json` that forces the window plan, video decode, and model loaders to re-execute between windows, making long runs much slower. `--cache-classic` keeps those outputs cached for the whole run. The flag also exists on older releases such as `v0.19.0`, where it is harmless. 7. Under workflows, select `foley-sliding-window.json` or `ltx_23_foley_v2a.json`. 8. Hit `Run`. ## Workflow Defaults - Input video: `input.mp4` - Prompt: `Two men are playing tennis. No speech is present. No music is present.` - Negative prompt: anti-music/anti-vocal prompt - Conditioning size: `576x576` - Frame window: `89` frames - Window overlap: `1.0` second - Maximum windows: `16` - Random ID: `42` - Sampling steps: `30` - Guidance: `4.0` - Save window audio: `true` - Window audio prefix: `ltx_foley_window` - LoRA strength: `1.0` Advanced sampler/STG settings are visible nodes in the loop body: sampler `euler_ancestral_cfg_pp`, STG scale `1.0`, rescale `0.7`, STG blocks `14, 19`, max shift `2.05`, base shift `0.95`, terminal `0.1`. The `foley-sliding-window.json` workflow uses the full uploaded video. Videos longer than the selected window are processed as overlapping windows and stitched into one generated audio track. Shorter videos are padded internally by repeating the last frame. The saved MP4 uses the source frames plus the stitched generated audio. Raw generated window WAVs are saved under ComfyUI's output directory in `ltx_foley_windows/` and their paths are listed in the manifest output. ## VRAM Notes Sampling is the VRAM peak. If you need to reduce memory use, try these changes in order: - reduce frames from `89` to `57`, `41`, or `25` - reduce conditioning size from `576x576` to `448x448` or `384x384` - reduce sampling steps from `30` to `20` Frame counts should stay one more than a multiple of 8: ```text 9, 17, 25, 33, 41, 49, 57, ..., 89, ..., 257 ``` For l`foley-sliding-window.json`, the default `max_windows` is `16` so accidental very long inputs fail clearly instead of running for hours. Increase it only when you expect the extra runtime. ## Troubleshooting ### Missing Nodes If ComfyUI reports missing `LTXFoley...` nodes after manual setup, verify that these files exist and then restart ComfyUI: ```text ComfyUI/custom_nodes/ltx_foley_v2a/__init__.py ComfyUI/custom_nodes/ltx_foley_v2a/nodes.py ``` ### Models Reload Or Nodes Re-Execute Between Windows If the log shows `planned N windows` repeating, or the checkpoint/text-encoder reloading before every window of `foley-sliding-window.json`, ComfyUI is running with its default RAM-pressure caching and is evicting node outputs mid-run. Start ComfyUI with `--cache-classic` (the RunPod script already does this). The generated audio is still correct either way — the re-execution only costs time. ### Duplicate Sounds At Window Boundaries In `foley-sliding-window.json`, neighboring windows overlap (default `1.0` second) and each window generates its audio independently. If a distinct sound event (a door close, a footstep) falls inside an overlap region, both windows may render it slightly out of alignment, and you can hear the event twice around a window boundary. The run log's `planned N windows starts=[...]` line shows where the boundaries are (`start_frame / fps` seconds). If you hear this, reduce the **Window overlap** (`overlap_seconds` on the window-plan node), for example from `1.0` to `0.5`. A smaller overlap makes it less likely an event lands in the shared region, at the cost of a shorter crossfade between windows. Avoid large overlaps: the bigger the overlap, the more of the video is generated twice, which increases the chance of doubled sounds. ### Audio Artifacts On Some ComfyUI Versions The workflows have been tested on ComfyUI `v0.27.0` and run successfully there. However, on `v0.27.0` and newer ComfyUI versions generally, we have noticed that LTX-2.3 video-to-audio can produce a high-pitched squeak or audio artifacts in some generated audio. If you notice the audio artifacts on a generation, rollback to `v0.19.0` of ComfyUI. If you are using the RunPod setup you can rollback by simply: ```bash cd /workspace COMFYUI_CORE_REF=v0.19.0 bash setup_runpod_ltx_foley.sh ``` Then reload `foley-sliding-window.json` and run it again. ### RunPod Setup #### Restarting/Rerun If you rerun setup after a workflow or node update: ```bash cd /workspace && bash setup_runpod_ltx_foley.sh ``` The script will skip verified model files, refresh the workflow/helper nodes, and restart ComfyUI. #### Model Downloads If model downloads fail with authorization errors, accept the relevant Hugging Face model terms and rerun with `HF_TOKEN` set. #### Logs Logs from the script-managed ComfyUI restart are written to: ```text /workspace/runpod-slim/comfyui-restart.log ``` ## License Scope The files in this workflow repository are released under the Apache-2.0 license. That applies to the workflow JSON, setup script, helper-node code, README/model card text, and bundled test assets in this repository. This workflow downloads and uses third-party model files that are governed by their own licenses and terms, including LTX-2.3, the Gemma text encoder, and the `FuzzPuppy/LTX-2.3-Foley-LoRA` weights.