Reflections and issues regarding author workflow files and the NSFW V41
First, I must express my gratitude to the author for their outstanding contribution.
Now, regarding some of my findings:
- Achieving synchronisation between audio and video necessitates foregoing the frame interpolation process.
- For enhanced audio clarity, consider employing the FP32 Audio VAE.
- For more complex actions and dialogue, the original Gemma_3_12B_it.safetensors yields superior results compared to fp8_scaled. Naturally, if synchronisation isn't a priority, fp8_scaled offers excellent value for money.
- Avoid excessively low resolutions, though do not exceed hardware limitations. The longer side should be at least 1024 pixels to prevent pixelation.
- Should you encounter OOM errors, consider incorporating VRAM-cleaning nodes into your workflow; this can prove beneficial.
- The three-frame workflow significantly preserves facial integrity.
Now for some points of confusion:
- Regarding Config CompressionArtifacts(I2V) set to 33 – does this indicate compression to 33% or a compression rate of 33%? How is this value determined?
- Why does ‘LTXV Audio VAE Decode’ connect both audio1 and audio2 from Audio Concat? What is the purpose of this dual connection?
purpose of audio connection is told in the workflow: if you use "pingpong" when combining videos, make the second half not to be silent
but I think actually it doesn't really have much use though
also, frame interpolation works well, it doesn't affect the synchronization of audio and video if set correctly
Using frame interpolation means you need to make sure your output frame rate is set correctly (e.g. 48 if your interpolation is 2x and the generation frame rate is 24). Audio sync should work fine.
"Compression Artifacts" is like a "crf" value used for video compression. A lower value is higher quality, but looks a bit less like a compressed video. The new I2V LORA I'm adding it means you don't have to rely on high compression values (>30) anymore. v5 of the SFW merge has it, and my next NSFW will have it (>v41).
jiimzlf is correct. The audio combining stuff is just to give a "pingpong" version of the video audio instead of silence. If you don't use "pingpong", it does nothing and can be ignored.
- For enhanced audio clarity, consider employing the FP32 Audio VAE.
would you please post a link for those who don't know how to google?
purpose of audio connection is told in the workflow: if you use "pingpong" when combining videos, make the second half not to be silent
but I think actually it doesn't really have much use though
also, frame interpolation works well, it doesn't affect the synchronization of audio and video if set correctly
I got the correct way to use it, awesome!
Using frame interpolation means you need to make sure your output frame rate is set correctly (e.g. 48 if your interpolation is 2x and the generation frame rate is 24). Audio sync should work fine.
"Compression Artifacts" is like a "crf" value used for video compression. A lower value is higher quality, but looks a bit less like a compressed video. The new I2V LORA I'm adding it means you don't have to rely on high compression values (>30) anymore. v5 of the SFW merge has it, and my next NSFW will have it (>v41).
jiimzlf is correct. The audio combining stuff is just to give a "pingpong" version of the video audio instead of silence. If you don't use "pingpong", it does nothing and can be ignored.
Thanks, in the v62 version, do I still need compression node? or have to keep it and config as 0?
