| --- |
| license: apache-2.0 |
| tags: |
| - comfyui |
| - ltx-video |
| - ltx-2.3 |
| - foley |
| - video-to-audio |
| - audio-generation |
| - workflow |
| - foley-lora |
| --- |
| |
| # LTX 2.3 Foley V2A ComfyUI Workflow |
|
|
| This repository contains ready-to-test ComfyUI workflows for the |
| [`FuzzPuppy/LTX-2.3-Foley-LoRA`](https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-LoRA) |
| LoRA. The LoRA adds Foley sound effects to a silent input video using LTX-2.3: |
| given a video and a prompt describing the visible action, the loop workflow |
| generates matching non-speech, non-music sound effects and saves a new MP4. |
|
|
| There are two workflows provided: |
| 1. `foley-sliding-window.json`: long-video workflow with overlapping audio windows and stitching. |
| 2. `ltx_23_foley_v2a.json`: original short-clip workflow. |
|
|
| If you want run a quick short test, use `ltx_23_foley_v2a.json`. Otherwise, use `foley-sliding-window.json` so you can generate longer audio while keeping memory under control. |
|
|
| ## Tutorial |
| [](https://youtu.be/qnHFDlrySmw) |
|
|
| [Watch the tutorial on YouTube](https://youtu.be/qnHFDlrySmw) |
|
|
| ## What Is Included |
|
|
| - `foley-sliding-window.json`: long-video workflow with overlapping audio windows and stitching. |
| - `ltx_23_foley_v2a.json`: original short-clip workflow. |
| - `setup_runpod_ltx_foley.sh`: one-command RunPod setup script. |
| - `ltx_foley_v2a`: small helper-node package. |
| - `tennis-no-sound.mp4`: default silent test video for RunPod setup. |
|
|
| Both workflows require the `ltx_foley_v2a` helper-node package. If ComfyUI shows |
| missing nodes named `LTXFoleyForLoopOpen`, `LTXFoleyWindowSelect`, |
| `LTXFoleyVideoToAudioLatent`, or `LTXFoleyAudioVAEDecode`, the workflow JSON was |
| loaded before these helper nodes were installed into `ComfyUI/custom_nodes`. |
|
|
| The helper-node package handles the workflow-specific pieces that stock ComfyUI |
| does not currently cover cleanly: |
|
|
| - plans the window count from the uploaded video |
| - provides a small local ComfyUI for-loop so no external loop-node pack is needed |
| - splits longer videos into overlapping windows |
| - freezes each source window as LTX video latents while leaving matching audio |
| latents empty for Foley generation |
| - decodes each audio window into the Comfy audio tensor layout expected by |
| current video saving nodes |
| - writes each raw decoded window as a WAV before stitching so artifacts can be |
| checked before the final crossfade |
| - crossfades and stitches generated audio windows into one final track |
|
|
| Prompt text, model loading, LoRA loading, video creation, and MP4 saving use |
| normal ComfyUI/LTXVideo nodes. |
|
|
| ## Fastest RunPod Test |
|
|
| Use the official RunPod **ComfyUI - CUDA 12.8** template: |
|
|
| https://console.runpod.io/deploy?template=cw3nka7d08&ref=k7b1cgii |
|
|
|
|
| 1. In RunPod, under "Additional Filters" filter CUDA versions to CUDA 12.8. |
| 2. Select a 48 GB GPU: A40, RTX A6000, L40/L40S, or A100. |
| 3. Make sure the `ComfyUI - CUDA 12.8` template is selected. |
| 4. The template's default volume disk is `50 GB`, which is enough for the core workflow files, but tight once caches and reruns accumulate. Change the volume disk to `100 GB` if you want more breathing room. |
| 5. Start the pod and open a terminal. |
| 6. Run: |
|
|
| ```bash |
| cd /workspace |
| curl -L https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-Workflow/resolve/main/setup_runpod_ltx_foley.sh -o setup_runpod_ltx_foley.sh |
| bash setup_runpod_ltx_foley.sh |
| ``` |
|
|
| The setup script installs the nodes and models, downloads the tennis test video as `input.mp4`, restarts ComfyUI without stopping the pod (with `--cache-classic`, see the Manual ComfyUI Install notes), and waits until the UI responds on port `8188`. |
|
|
| By default the script installs ComfyUI `v0.27.0`. To test another ComfyUI release, set `COMFYUI_CORE_REF`: |
|
|
| ```bash |
| COMFYUI_CORE_REF=v0.19.0 bash setup_runpod_ltx_foley.sh |
| ``` |
|
|
| To install workflow files from a different Hugging Face branch, set |
| `WORKFLOW_REVISION`: |
|
|
| ```bash |
| WORKFLOW_REVISION=windows bash setup_runpod_ltx_foley.sh |
| ``` |
|
|
| After the script finishes: |
|
|
| 1. Open ComfyUI from the RunPod web UI. |
| 2. Under workflows, select `foley-sliding-window.json`. |
| 3. Hit `Run`. |
|
|
| The default input video and prompt are already set: |
|
|
| ```text |
| Two men are playing tennis. No speech is present. No music is present. |
| ``` |
|
|
| ## What The Script Installs |
|
|
| The script assumes the official CUDA 12.8 template layout from |
| `runpod-workers/comfyui-base`: |
|
|
| - ComfyUI: `/workspace/runpod-slim/ComfyUI` |
| - Python environment: `/workspace/runpod-slim/ComfyUI/.venv-cu128` |
| - ComfyUI port: `8188` |
|
|
| It installs or refreshes: |
|
|
| - `Lightricks/ComfyUI-LTXVideo` |
| - `ltx_foley_v2a` helper nodes |
| - `foley-sliding-window.json` |
| - `ltx_23_foley_v2a.json` |
| - `tennis-no-sound.mp4` |
|
|
| The script also applies a small compatibility patch to the installed |
| `ComfyUI-LTXVideo/pyramid_blending.py` file so current Kornia builds can import |
| the node pack on fresh ComfyUI installs. |
|
|
| It downloads these model files: |
|
|
| - Base checkpoint: |
| [`Lightricks/LTX-2.3-fp8/ltx-2.3-22b-dev-fp8.safetensors`](https://huggingface.co/Lightricks/LTX-2.3-fp8/blob/main/ltx-2.3-22b-dev-fp8.safetensors) |
| - Text encoder: |
| [`Comfy-Org/ltx-2/split_files/text_encoders/gemma_3_12B_it_fp8_scaled.safetensors`](https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it_fp8_scaled.safetensors) |
| - Foley LoRA: |
| [`FuzzPuppy/LTX-2.3-Foley-LoRA/ltx-2.3-foley-400-steps.safetensors`](https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-LoRA/blob/main/ltx-2.3-foley-400-steps.safetensors) |
|
|
| Large model downloads are SHA-256 verified. Completed files are skipped on |
| rerun, interrupted downloads resume from `*.part` files, and corrupt partials |
| are retried once from scratch. |
|
|
| ## Manual ComfyUI Install |
|
|
| If you are not using the RunPod script: |
|
|
| 1. Install or update ComfyUI. |
| 2. Install the official LTXVideo custom nodes: |
| `https://github.com/Lightricks/ComfyUI-LTXVideo` |
| 3. Install the Foley helper nodes by placing the workflow repo's |
| `ltx_foley_v2a` folder into: |
| `ComfyUI/custom_nodes/` |
| 4. Copy the either `foley-sliding-window.json` or `ltx_23_foley_v2a.json` into your ComfyUI user workflows folder. In a standard ComfyUI install this is: |
| `ComfyUI/user/default/workflows`. |
| 5. Put the model files in: |
| - checkpoint: |
| [`ltx-2.3-22b-dev-fp8.safetensors`](https://huggingface.co/Lightricks/LTX-2.3-fp8/blob/main/ltx-2.3-22b-dev-fp8.safetensors) |
| in `ComfyUI/models/checkpoints` |
| - text encoder: |
| [`gemma_3_12B_it_fp8_scaled.safetensors`](https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it_fp8_scaled.safetensors) |
| in `ComfyUI/models/text_encoders` |
| - Foley LoRA: |
| [`ltx-2.3-foley-400-steps.safetensors`](https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-LoRA/blob/main/ltx-2.3-foley-400-steps.safetensors) |
| in `ComfyUI/models/loras` |
| 6. Restart ComfyUI, starting it with the `--cache-classic` flag: |
| |
| ```bash |
| python main.py --cache-classic |
| ``` |
|
|
| On newer ComfyUI versions (`v0.27.0`+) the default caching mode is RAM-pressure |
| caching, which can evict node outputs in the middle of a run while the large |
| LTX models load. For `foley-sliding-window.json` that forces the window plan, |
| video decode, and model loaders to re-execute between windows, making long |
| runs much slower. `--cache-classic` keeps those outputs cached for the whole |
| run. The flag also exists on older releases such as `v0.19.0`, where it is |
| harmless. |
| 7. Under workflows, select `foley-sliding-window.json` or `ltx_23_foley_v2a.json`. |
| 8. Hit `Run`. |
|
|
| ## Workflow Defaults |
|
|
| - Input video: `input.mp4` |
| - Prompt: `Two men are playing tennis. No speech is present. No music is present.` |
| - Negative prompt: anti-music/anti-vocal prompt |
| - Conditioning size: `576x576` |
| - Frame window: `89` frames |
| - Window overlap: `1.0` second |
| - Maximum windows: `16` |
| - Random ID: `42` |
| - Sampling steps: `30` |
| - Guidance: `4.0` |
| - Save window audio: `true` |
| - Window audio prefix: `ltx_foley_window` |
| - LoRA strength: `1.0` |
|
|
| Advanced sampler/STG settings are visible nodes in the loop body: |
| sampler `euler_ancestral_cfg_pp`, STG scale `1.0`, rescale `0.7`, STG blocks |
| `14, 19`, max shift `2.05`, base shift `0.95`, terminal `0.1`. |
|
|
| The `foley-sliding-window.json` workflow uses the full uploaded video. Videos longer than the selected |
| window are processed as overlapping windows and stitched into one generated |
| audio track. Shorter videos are padded internally by repeating the last frame. |
| The saved MP4 uses the source frames plus the stitched generated audio. |
| Raw generated window WAVs are saved under ComfyUI's output directory in |
| `ltx_foley_windows/` and their paths are listed in the manifest output. |
|
|
| ## VRAM Notes |
|
|
| Sampling is the VRAM peak. |
| If you need to reduce memory use, try these changes in order: |
|
|
| - reduce frames from `89` to `57`, `41`, or `25` |
| - reduce conditioning size from `576x576` to `448x448` or `384x384` |
| - reduce sampling steps from `30` to `20` |
|
|
| Frame counts should stay one more than a multiple of 8: |
|
|
| ```text |
| 9, 17, 25, 33, 41, 49, 57, ..., 89, ..., 257 |
| ``` |
|
|
| For l`foley-sliding-window.json`, the default `max_windows` is `16` so accidental very long inputs |
| fail clearly instead of running for hours. Increase it only when you expect the |
| extra runtime. |
|
|
| ## Troubleshooting |
|
|
| ### Missing Nodes |
|
|
| If ComfyUI reports missing `LTXFoley...` nodes after manual setup, verify that |
| these files exist and then restart ComfyUI: |
|
|
| ```text |
| ComfyUI/custom_nodes/ltx_foley_v2a/__init__.py |
| ComfyUI/custom_nodes/ltx_foley_v2a/nodes.py |
| ``` |
|
|
| ### Models Reload Or Nodes Re-Execute Between Windows |
|
|
| If the log shows `planned N windows` repeating, or the checkpoint/text-encoder |
| reloading before every window of `foley-sliding-window.json`, ComfyUI is running |
| with its default RAM-pressure caching and is evicting node outputs mid-run. |
| Start ComfyUI with `--cache-classic` (the RunPod script already does this). The |
| generated audio is still correct either way — the re-execution only costs time. |
|
|
| ### Duplicate Sounds At Window Boundaries |
|
|
| In `foley-sliding-window.json`, neighboring windows overlap (default `1.0` |
| second) and each window generates its audio independently. If a distinct sound |
| event (a door close, a footstep) falls inside an overlap region, both windows |
| may render it slightly out of alignment, and you can hear the event twice |
| around a window boundary. The run log's `planned N windows starts=[...]` line |
| shows where the boundaries are (`start_frame / fps` seconds). |
|
|
| If you hear this, reduce the **Window overlap** (`overlap_seconds` on the |
| window-plan node), for example from `1.0` to `0.5`. A smaller overlap makes it |
| less likely an event lands in the shared region, at the cost of a shorter |
| crossfade between windows. Avoid large overlaps: the bigger the overlap, the |
| more of the video is generated twice, which increases the chance of doubled |
| sounds. |
|
|
| ### Audio Artifacts On Some ComfyUI Versions |
|
|
| The workflows have been tested on ComfyUI `v0.27.0` and run |
| successfully there. However, on `v0.27.0` and newer ComfyUI versions generally, we have noticed that LTX-2.3 video-to-audio can produce a high-pitched squeak or audio artifacts in some generated audio. |
|
|
| If you notice the audio artifacts on a generation, rollback to `v0.19.0` of ComfyUI. |
|
|
| If you are using the RunPod setup you can rollback by simply: |
|
|
| ```bash |
| cd /workspace |
| COMFYUI_CORE_REF=v0.19.0 bash setup_runpod_ltx_foley.sh |
| ``` |
|
|
| Then reload `foley-sliding-window.json` and run it again. |
|
|
| ### RunPod Setup |
|
|
| #### Restarting/Rerun |
|
|
| If you rerun setup after a workflow or node update: |
|
|
| ```bash |
| cd /workspace && bash setup_runpod_ltx_foley.sh |
| ``` |
|
|
| The script will skip verified model files, refresh the workflow/helper nodes, |
| and restart ComfyUI. |
|
|
| #### Model Downloads |
|
|
| If model downloads fail with authorization errors, accept the relevant Hugging |
| Face model terms and rerun with `HF_TOKEN` set. |
|
|
| #### Logs |
|
|
| Logs from the script-managed ComfyUI restart are written to: |
|
|
| ```text |
| /workspace/runpod-slim/comfyui-restart.log |
| ``` |
|
|
| ## License Scope |
|
|
| The files in this workflow repository are released under the Apache-2.0 license. |
| That applies to the workflow JSON, setup script, helper-node code, README/model |
| card text, and bundled test assets in this repository. |
|
|
| This workflow downloads and uses third-party model files that are governed by |
| their own licenses and terms, including LTX-2.3, the Gemma text encoder, and the |
| `FuzzPuppy/LTX-2.3-Foley-LoRA` weights. |
|
|