LTX 2.3 Foley V2A ComfyUI Workflow

This repository contains a ready-to-test ComfyUI workflow for the FuzzPuppy/LTX-2.3-Foley-LoRA LoRA. The LoRA adds Foley sound effects to a silent input video using LTX-2.3: given a short video and a prompt describing the visible action, the workflow generates matching non-speech, non-music sound effects and saves a new MP4.

The workflow is intentionally small: it takes one short video clip, keeps the first 89 frames, generates matching Foley audio, and saves an MP4 with the source frames plus generated audio. It is meant as the fastest public ComfyUI test for FuzzPuppy/LTX-2.3-Foley-LoRA, not as a full long-video stitching pipeline.

Tutorial

Watch the tutorial on YouTube

What Is Included

ltx_23_foley_v2a.json: ComfyUI workflow.
setup_runpod_ltx_foley.sh: one-command RunPod setup script.
ltx_foley_v2a: small helper-node package.
tennis-no-sound.mp4: default silent test video for RunPod setup.

The helper-node package handles the workflow-specific pieces that stock ComfyUI does not currently cover cleanly:

freezes the uploaded video as LTX video latents while leaving matching audio latents empty for Foley generation
trims or pads the saved video frames to the same generation window
decodes LTX audio VAE output into the Comfy audio tensor layout expected by current video saving nodes

Prompt text, model loading, LoRA loading, sampling, video creation, and MP4 saving use normal ComfyUI/LTXVideo nodes.

Fastest RunPod Test

Use the official RunPod ComfyUI - CUDA 12.8 template:

https://console.runpod.io/deploy?template=cw3nka7d08&ref=k7b1cgii

In RunPod, under "Additional Filters" filter CUDA versions to CUDA 12.8.
Select a 48 GB GPU: A40, RTX A6000, L40/L40S, or A100.
Make sure the ComfyUI - CUDA 12.8 template is selected.
The template's default volume disk is 50 GB, which is enough for the core workflow files, but tight once caches and reruns accumulate. Change the volume disk to 100 GB if you want more breathing room.
Start the pod and open a terminal.
Run:

cd /workspace && curl -L https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-Workflow/resolve/main/setup_runpod_ltx_foley.sh -o setup_runpod_ltx_foley.sh && bash setup_runpod_ltx_foley.sh

The setup script installs the nodes and models, downloads the tennis test video as input.mp4, restarts ComfyUI without stopping the pod, and waits until the UI responds on port 8188.

After the script finishes:

Open ComfyUI from the RunPod web UI.
Under workflows, select ltx_23_foley_v2a.json.
Hit Run.

The default input video and prompt are already set:

Two men are playing tennis. No speech is present. No music is present.

What The Script Installs

The script assumes the official CUDA 12.8 template layout from runpod-workers/comfyui-base:

ComfyUI: /workspace/runpod-slim/ComfyUI
Python environment: /workspace/runpod-slim/ComfyUI/.venv-cu128
ComfyUI port: 8188

It installs or refreshes:

Lightricks/ComfyUI-LTXVideo
ltx_foley_v2a helper nodes
ltx_23_foley_v2a.json
tennis-no-sound.mp4

It downloads these model files:

Base checkpoint: Lightricks/LTX-2.3-fp8/ltx-2.3-22b-dev-fp8.safetensors
Text encoder: Comfy-Org/ltx-2/split_files/text_encoders/gemma_3_12B_it_fp8_scaled.safetensors
Foley LoRA: FuzzPuppy/LTX-2.3-Foley-LoRA/ltx-2.3-foley-400-steps.safetensors

Large model downloads are SHA-256 verified. Completed files are skipped on rerun, interrupted downloads resume from *.part files, and corrupt partials are retried once from scratch.

Manual ComfyUI Install

If you are not using the RunPod script:

Install or update ComfyUI.
Install the official LTXVideo custom nodes: https://github.com/Lightricks/ComfyUI-LTXVideo
Copy or symlink ltx_foley_v2a into: ComfyUI/custom_nodes/ltx_foley_v2a
Put the model files in:
- checkpoint: ltx-2.3-22b-dev-fp8.safetensors in ComfyUI/models/checkpoints
- text encoder: gemma_3_12B_it_fp8_scaled.safetensors in ComfyUI/models/text_encoders
- Foley LoRA: ltx-2.3-foley-400-steps.safetensors in ComfyUI/models/loras
Restart ComfyUI.
Under workflows, select ltx_23_foley_v2a.json.
Hit Run.

Workflow Defaults

Input video: input.mp4
Prompt: Two men are playing tennis. No speech is present. No music is present.
Negative prompt: anti-music/anti-vocal prompt
Conditioning size: 576x576
Frame window: 89 frames
Sampling steps: 30
Guidance: 4.0
STG blocks: 14, 19
LoRA strength: 1.0

The workflow uses the first 89 frames of the uploaded video. Shorter videos are padded by repeating the last frame. The saved video is trimmed or padded to the same frame count so the output video duration matches the generated audio.

VRAM Notes

Sampling is the VRAM peak. Start with a 48 GB GPU for the first reliable test. If you need to reduce memory use, try these changes in order:

reduce frames from 89 to 57, 41, or 25
reduce conditioning size from 576x576 to 448x448 or 384x384
reduce sampling steps from 30 to 20
lower or disable STG if quality is acceptable

Frame counts should stay one more than a multiple of 8:

9, 17, 25, 33, 41, 49, 57, ..., 89, ..., 257

Troubleshooting RunPod Setup

If you rerun setup after a workflow or node update:

cd /workspace && bash setup_runpod_ltx_foley.sh

The script will skip verified model files, refresh the workflow/helper nodes, and restart ComfyUI.

If model downloads fail with authorization errors, accept the relevant Hugging Face model terms and rerun with HF_TOKEN set.

Logs from the script-managed ComfyUI restart are written to:

/workspace/runpod-slim/comfyui-restart.log

License Scope

The files in this workflow repository are released under the Apache-2.0 license. That applies to the workflow JSON, setup script, helper-node code, README/model card text, and bundled test assets in this repository.

This workflow downloads and uses third-party model files that are governed by their own licenses and terms, including LTX-2.3, the Gemma text encoder, and the FuzzPuppy/LTX-2.3-Foley-LoRA weights.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support