LTX 2.3 Foley V2A ComfyUI Workflow
This repository contains a ready-to-test ComfyUI workflow for the
FuzzPuppy/LTX-2.3-Foley-LoRA
LoRA. The LoRA adds Foley sound effects to a silent input video using LTX-2.3:
given a short video and a prompt describing the visible action, the workflow
generates matching non-speech, non-music sound effects and saves a new MP4.
The workflow is intentionally small: it takes one short video clip, keeps the
first 89 frames, generates matching Foley audio, and saves an MP4 with the
source frames plus generated audio. It is meant as the fastest public ComfyUI
test for FuzzPuppy/LTX-2.3-Foley-LoRA, not as a full long-video stitching
pipeline.
Tutorial
What Is Included
ltx_23_foley_v2a.json: ComfyUI workflow.setup_runpod_ltx_foley.sh: one-command RunPod setup script.ltx_foley_v2a: small helper-node package.tennis-no-sound.mp4: default silent test video for RunPod setup.
The helper-node package handles the workflow-specific pieces that stock ComfyUI does not currently cover cleanly:
- freezes the uploaded video as LTX video latents while leaving matching audio latents empty for Foley generation
- trims or pads the saved video frames to the same generation window
- decodes LTX audio VAE output into the Comfy audio tensor layout expected by current video saving nodes
Prompt text, model loading, LoRA loading, sampling, video creation, and MP4 saving use normal ComfyUI/LTXVideo nodes.
Fastest RunPod Test
Use the official RunPod ComfyUI - CUDA 12.8 template:
https://console.runpod.io/deploy?template=cw3nka7d08&ref=k7b1cgii
- In RunPod, under "Additional Filters" filter CUDA versions to CUDA 12.8.
- Select a 48 GB GPU: A40, RTX A6000, L40/L40S, or A100.
- Make sure the
ComfyUI - CUDA 12.8template is selected. - The template's default volume disk is
50 GB, which is enough for the core workflow files, but tight once caches and reruns accumulate. Change the volume disk to100 GBif you want more breathing room. - Start the pod and open a terminal.
- Run:
cd /workspace && curl -L https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-Workflow/resolve/main/setup_runpod_ltx_foley.sh -o setup_runpod_ltx_foley.sh && bash setup_runpod_ltx_foley.sh
The setup script installs the nodes and models, downloads the tennis test video
as input.mp4, restarts ComfyUI without stopping the pod, and waits until the UI
responds on port 8188.
After the script finishes:
- Open ComfyUI from the RunPod web UI.
- Under workflows, select
ltx_23_foley_v2a.json. - Hit
Run.
The default input video and prompt are already set:
Two men are playing tennis. No speech is present. No music is present.
What The Script Installs
The script assumes the official CUDA 12.8 template layout from
runpod-workers/comfyui-base:
- ComfyUI:
/workspace/runpod-slim/ComfyUI - Python environment:
/workspace/runpod-slim/ComfyUI/.venv-cu128 - ComfyUI port:
8188
It installs or refreshes:
Lightricks/ComfyUI-LTXVideoltx_foley_v2ahelper nodesltx_23_foley_v2a.jsontennis-no-sound.mp4
It downloads these model files:
- Base checkpoint:
Lightricks/LTX-2.3-fp8/ltx-2.3-22b-dev-fp8.safetensors - Text encoder:
Comfy-Org/ltx-2/split_files/text_encoders/gemma_3_12B_it_fp8_scaled.safetensors - Foley LoRA:
FuzzPuppy/LTX-2.3-Foley-LoRA/ltx-2.3-foley-400-steps.safetensors
Large model downloads are SHA-256 verified. Completed files are skipped on
rerun, interrupted downloads resume from *.part files, and corrupt partials
are retried once from scratch.
Manual ComfyUI Install
If you are not using the RunPod script:
- Install or update ComfyUI.
- Install the official LTXVideo custom nodes:
https://github.com/Lightricks/ComfyUI-LTXVideo - Copy or symlink
ltx_foley_v2ainto:ComfyUI/custom_nodes/ltx_foley_v2a - Put the model files in:
- checkpoint:
ltx-2.3-22b-dev-fp8.safetensorsinComfyUI/models/checkpoints - text encoder:
gemma_3_12B_it_fp8_scaled.safetensorsinComfyUI/models/text_encoders - Foley LoRA:
ltx-2.3-foley-400-steps.safetensorsinComfyUI/models/loras
- checkpoint:
- Restart ComfyUI.
- Under workflows, select
ltx_23_foley_v2a.json. - Hit
Run.
Workflow Defaults
- Input video:
input.mp4 - Prompt:
Two men are playing tennis. No speech is present. No music is present. - Negative prompt: anti-music/anti-vocal prompt
- Conditioning size:
576x576 - Frame window:
89frames - Sampling steps:
30 - Guidance:
4.0 - STG blocks:
14, 19 - LoRA strength:
1.0
The workflow uses the first 89 frames of the uploaded video. Shorter videos are
padded by repeating the last frame. The saved video is trimmed or padded to the
same frame count so the output video duration matches the generated audio.
VRAM Notes
Sampling is the VRAM peak. Start with a 48 GB GPU for the first reliable test. If you need to reduce memory use, try these changes in order:
- reduce frames from
89to57,41, or25 - reduce conditioning size from
576x576to448x448or384x384 - reduce sampling steps from
30to20 - lower or disable STG if quality is acceptable
Frame counts should stay one more than a multiple of 8:
9, 17, 25, 33, 41, 49, 57, ..., 89, ..., 257
Troubleshooting RunPod Setup
If you rerun setup after a workflow or node update:
cd /workspace && bash setup_runpod_ltx_foley.sh
The script will skip verified model files, refresh the workflow/helper nodes, and restart ComfyUI.
If model downloads fail with authorization errors, accept the relevant Hugging
Face model terms and rerun with HF_TOKEN set.
Logs from the script-managed ComfyUI restart are written to:
/workspace/runpod-slim/comfyui-restart.log
License Scope
The files in this workflow repository are released under the Apache-2.0 license. That applies to the workflow JSON, setup script, helper-node code, README/model card text, and bundled test assets in this repository.
This workflow downloads and uses third-party model files that are governed by
their own licenses and terms, including LTX-2.3, the Gemma text encoder, and the
FuzzPuppy/LTX-2.3-Foley-LoRA weights.
