Fixed duplicate sound in last window

b40231e about 23 hours ago

12.4 kB

	---
	license: apache-2.0
	tags:
	- comfyui
	- ltx-video
	- ltx-2.3
	- foley
	- video-to-audio
	- audio-generation
	- workflow
	- foley-lora
	---

	# LTX 2.3 Foley V2A ComfyUI Workflow

	This repository contains ready-to-test ComfyUI workflows for the
	[`FuzzPuppy/LTX-2.3-Foley-LoRA`](https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-LoRA)
	LoRA. The LoRA adds Foley sound effects to a silent input video using LTX-2.3:
	given a video and a prompt describing the visible action, the loop workflow
	generates matching non-speech, non-music sound effects and saves a new MP4.

	There are two workflows provided:
	1. `foley-sliding-window.json`: long-video workflow with overlapping audio windows and stitching.
	2. `ltx_23_foley_v2a.json`: original short-clip workflow.

	If you want run a quick short test, use `ltx_23_foley_v2a.json`. Otherwise, use `foley-sliding-window.json` so you can generate longer audio while keeping memory under control.

	## Tutorial
	[![Watch the tutorial: using the LTX-2.3 Foley LoRA in ComfyUI](https://img.youtube.com/vi/qnHFDlrySmw/hqdefault.jpg)](https://youtu.be/qnHFDlrySmw)

	[Watch the tutorial on YouTube](https://youtu.be/qnHFDlrySmw)

	## What Is Included

	- `foley-sliding-window.json`: long-video workflow with overlapping audio windows and stitching.
	- `ltx_23_foley_v2a.json`: original short-clip workflow.
	- `setup_runpod_ltx_foley.sh`: one-command RunPod setup script.
	- `ltx_foley_v2a`: small helper-node package.
	- `tennis-no-sound.mp4`: default silent test video for RunPod setup.

	Both workflows require the `ltx_foley_v2a` helper-node package. If ComfyUI shows
	missing nodes named `LTXFoleyForLoopOpen`, `LTXFoleyWindowSelect`,
	`LTXFoleyVideoToAudioLatent`, or `LTXFoleyAudioVAEDecode`, the workflow JSON was
	loaded before these helper nodes were installed into `ComfyUI/custom_nodes`.

	The helper-node package handles the workflow-specific pieces that stock ComfyUI
	does not currently cover cleanly:

	- plans the window count from the uploaded video
	- provides a small local ComfyUI for-loop so no external loop-node pack is needed
	- splits longer videos into overlapping windows
	- freezes each source window as LTX video latents while leaving matching audio
	latents empty for Foley generation
	- decodes each audio window into the Comfy audio tensor layout expected by
	current video saving nodes
	- writes each raw decoded window as a WAV before stitching so artifacts can be
	checked before the final crossfade
	- crossfades and stitches generated audio windows into one final track

	Prompt text, model loading, LoRA loading, video creation, and MP4 saving use
	normal ComfyUI/LTXVideo nodes.

	## Fastest RunPod Test

	Use the official RunPod ComfyUI - CUDA 12.8 template:

	https://console.runpod.io/deploy?template=cw3nka7d08&ref=k7b1cgii


	1. In RunPod, under "Additional Filters" filter CUDA versions to CUDA 12.8.
	2. Select a 48 GB GPU: A40, RTX A6000, L40/L40S, or A100.
	3. Make sure the `ComfyUI - CUDA 12.8` template is selected.
	4. The template's default volume disk is `50 GB`, which is enough for the core workflow files, but tight once caches and reruns accumulate. Change the volume disk to `100 GB` if you want more breathing room.
	5. Start the pod and open a terminal.
	6. Run:

	```bash
	cd /workspace
	curl -L https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-Workflow/resolve/main/setup_runpod_ltx_foley.sh -o setup_runpod_ltx_foley.sh
	bash setup_runpod_ltx_foley.sh
	```

	The setup script installs the nodes and models, downloads the tennis test video as `input.mp4`, restarts ComfyUI without stopping the pod (with `--cache-classic`, see the Manual ComfyUI Install notes), and waits until the UI responds on port `8188`.

	By default the script installs ComfyUI `v0.27.0`. To test another ComfyUI release, set `COMFYUI_CORE_REF`:

	```bash
	COMFYUI_CORE_REF=v0.19.0 bash setup_runpod_ltx_foley.sh
	```

	To install workflow files from a different Hugging Face branch, set
	`WORKFLOW_REVISION`:

	```bash
	WORKFLOW_REVISION=windows bash setup_runpod_ltx_foley.sh
	```

	After the script finishes:

	1. Open ComfyUI from the RunPod web UI.
	2. Under workflows, select `foley-sliding-window.json`.
	3. Hit `Run`.

	The default input video and prompt are already set:

	```text
	Two men are playing tennis. No speech is present. No music is present.
	```

	## What The Script Installs

	The script assumes the official CUDA 12.8 template layout from
	`runpod-workers/comfyui-base`:

	- ComfyUI: `/workspace/runpod-slim/ComfyUI`
	- Python environment: `/workspace/runpod-slim/ComfyUI/.venv-cu128`
	- ComfyUI port: `8188`

	It installs or refreshes:

	- `Lightricks/ComfyUI-LTXVideo`
	- `ltx_foley_v2a` helper nodes
	- `foley-sliding-window.json`
	- `ltx_23_foley_v2a.json`
	- `tennis-no-sound.mp4`

	The script also applies a small compatibility patch to the installed
	`ComfyUI-LTXVideo/pyramid_blending.py` file so current Kornia builds can import
	the node pack on fresh ComfyUI installs.

	It downloads these model files:

	- Base checkpoint:
	[`Lightricks/LTX-2.3-fp8/ltx-2.3-22b-dev-fp8.safetensors`](https://huggingface.co/Lightricks/LTX-2.3-fp8/blob/main/ltx-2.3-22b-dev-fp8.safetensors)
	- Text encoder:
	[`Comfy-Org/ltx-2/split_files/text_encoders/gemma_3_12B_it_fp8_scaled.safetensors`](https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it_fp8_scaled.safetensors)
	- Foley LoRA:
	[`FuzzPuppy/LTX-2.3-Foley-LoRA/ltx-2.3-foley-400-steps.safetensors`](https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-LoRA/blob/main/ltx-2.3-foley-400-steps.safetensors)

	Large model downloads are SHA-256 verified. Completed files are skipped on
	rerun, interrupted downloads resume from `*.part` files, and corrupt partials
	are retried once from scratch.

	## Manual ComfyUI Install

	If you are not using the RunPod script:

	1. Install or update ComfyUI.
	2. Install the official LTXVideo custom nodes:
	`https://github.com/Lightricks/ComfyUI-LTXVideo`
	3. Install the Foley helper nodes by placing the workflow repo's
	`ltx_foley_v2a` folder into:
	`ComfyUI/custom_nodes/`
	4. Copy the either `foley-sliding-window.json` or `ltx_23_foley_v2a.json` into your ComfyUI user workflows folder. In a standard ComfyUI install this is:
	`ComfyUI/user/default/workflows`.
	5. Put the model files in:
	- checkpoint:
	[`ltx-2.3-22b-dev-fp8.safetensors`](https://huggingface.co/Lightricks/LTX-2.3-fp8/blob/main/ltx-2.3-22b-dev-fp8.safetensors)
	in `ComfyUI/models/checkpoints`
	- text encoder:
	[`gemma_3_12B_it_fp8_scaled.safetensors`](https://huggingface.co/Comfy-Org/ltx-2/blob/main/split_files/text_encoders/gemma_3_12B_it_fp8_scaled.safetensors)
	in `ComfyUI/models/text_encoders`
	- Foley LoRA:
	[`ltx-2.3-foley-400-steps.safetensors`](https://huggingface.co/FuzzPuppy/LTX-2.3-Foley-LoRA/blob/main/ltx-2.3-foley-400-steps.safetensors)
	in `ComfyUI/models/loras`
	6. Restart ComfyUI, starting it with the `--cache-classic` flag:

	```bash
	python main.py --cache-classic
	```

	On newer ComfyUI versions (`v0.27.0`+) the default caching mode is RAM-pressure
	caching, which can evict node outputs in the middle of a run while the large
	LTX models load. For `foley-sliding-window.json` that forces the window plan,
	video decode, and model loaders to re-execute between windows, making long
	runs much slower. `--cache-classic` keeps those outputs cached for the whole
	run. The flag also exists on older releases such as `v0.19.0`, where it is
	harmless.
	7. Under workflows, select `foley-sliding-window.json` or `ltx_23_foley_v2a.json`.
	8. Hit `Run`.

	## Workflow Defaults

	- Input video: `input.mp4`
	- Prompt: `Two men are playing tennis. No speech is present. No music is present.`
	- Negative prompt: anti-music/anti-vocal prompt
	- Conditioning size: `576x576`
	- Frame window: `89` frames
	- Window overlap: `1.0` second
	- Maximum windows: `16`
	- Random ID: `42`
	- Sampling steps: `30`
	- Guidance: `4.0`
	- Save window audio: `true`
	- Window audio prefix: `ltx_foley_window`
	- LoRA strength: `1.0`

	Advanced sampler/STG settings are visible nodes in the loop body:
	sampler `euler_ancestral_cfg_pp`, STG scale `1.0`, rescale `0.7`, STG blocks
	`14, 19`, max shift `2.05`, base shift `0.95`, terminal `0.1`.

	The `foley-sliding-window.json` workflow uses the full uploaded video. Videos longer than the selected
	window are processed as overlapping windows and stitched into one generated
	audio track. Shorter videos are padded internally by repeating the last frame.
	The saved MP4 uses the source frames plus the stitched generated audio.
	Raw generated window WAVs are saved under ComfyUI's output directory in
	`ltx_foley_windows/` and their paths are listed in the manifest output.

	## VRAM Notes

	Sampling is the VRAM peak.
	If you need to reduce memory use, try these changes in order:

	- reduce frames from `89` to `57`, `41`, or `25`
	- reduce conditioning size from `576x576` to `448x448` or `384x384`
	- reduce sampling steps from `30` to `20`

	Frame counts should stay one more than a multiple of 8:

	```text
	9, 17, 25, 33, 41, 49, 57, ..., 89, ..., 257
	```

	For l`foley-sliding-window.json`, the default `max_windows` is `16` so accidental very long inputs
	fail clearly instead of running for hours. Increase it only when you expect the
	extra runtime.

	## Troubleshooting

	### Missing Nodes

	If ComfyUI reports missing `LTXFoley...` nodes after manual setup, verify that
	these files exist and then restart ComfyUI:

	```text
	ComfyUI/custom_nodes/ltx_foley_v2a/__init__.py
	ComfyUI/custom_nodes/ltx_foley_v2a/nodes.py
	```

	### Models Reload Or Nodes Re-Execute Between Windows

	If the log shows `planned N windows` repeating, or the checkpoint/text-encoder
	reloading before every window of `foley-sliding-window.json`, ComfyUI is running
	with its default RAM-pressure caching and is evicting node outputs mid-run.
	Start ComfyUI with `--cache-classic` (the RunPod script already does this). The
	generated audio is still correct either way — the re-execution only costs time.

	### Duplicate Sounds At Window Boundaries

	In `foley-sliding-window.json`, neighboring windows overlap (default `1.0`
	second) and each window generates its audio independently. If a distinct sound
	event (a door close, a footstep) falls inside an overlap region, both windows
	may render it slightly out of alignment, and you can hear the event twice
	around a window boundary. The run log's `planned N windows starts=[...]` line
	shows where the boundaries are (`start_frame / fps` seconds).

	If you hear this, reduce the Window overlap (`overlap_seconds` on the
	window-plan node), for example from `1.0` to `0.5`. A smaller overlap makes it
	less likely an event lands in the shared region, at the cost of a shorter
	crossfade between windows. Avoid large overlaps: the bigger the overlap, the
	more of the video is generated twice, which increases the chance of doubled
	sounds.

	### Audio Artifacts On Some ComfyUI Versions

	The workflows have been tested on ComfyUI `v0.27.0` and run
	successfully there. However, on `v0.27.0` and newer ComfyUI versions generally, we have noticed that LTX-2.3 video-to-audio can produce a high-pitched squeak or audio artifacts in some generated audio.

	If you notice the audio artifacts on a generation, rollback to `v0.19.0` of ComfyUI.

	If you are using the RunPod setup you can rollback by simply:

	```bash
	cd /workspace
	COMFYUI_CORE_REF=v0.19.0 bash setup_runpod_ltx_foley.sh
	```

	Then reload `foley-sliding-window.json` and run it again.

	### RunPod Setup

	#### Restarting/Rerun

	If you rerun setup after a workflow or node update:

	```bash
	cd /workspace && bash setup_runpod_ltx_foley.sh
	```

	The script will skip verified model files, refresh the workflow/helper nodes,
	and restart ComfyUI.

	#### Model Downloads

	If model downloads fail with authorization errors, accept the relevant Hugging
	Face model terms and rerun with `HF_TOKEN` set.

	#### Logs

	Logs from the script-managed ComfyUI restart are written to:

	```text
	/workspace/runpod-slim/comfyui-restart.log
	```

	## License Scope

	The files in this workflow repository are released under the Apache-2.0 license.
	That applies to the workflow JSON, setup script, helper-node code, README/model
	card text, and bundled test assets in this repository.

	This workflow downloads and uses third-party model files that are governed by
	their own licenses and terms, including LTX-2.3, the Gemma text encoder, and the
	`FuzzPuppy/LTX-2.3-Foley-LoRA` weights.