trying to run this in wan2gp by deepbeepmeep

#58

by ppppda1 - opened Feb 2

Feb 2

Hi, i'm trying to run your LTX2 merge in deepbeepmeep's interface "wan2gp", do you know if it's possible? I've created a "finetune" json file for it which have worked for other LTX2 safetensors. But with your's i am getting the following errors... any insights?

Loading Model '.\ckpts/ltx2-phr00tmerge-nsfw-v62.safetensors' ...
Loading Text Encoder 'ckpts\gemma_3_12B_it_heretic_fp8_e4m3fn.safetensors' ...
************ Memory Management for the GPU Poor (mmgp 3.7.2) by DeepBeepMeep ************
Pinning data of 'transformer' to reserved RAM
The whole model was pinned to reserved RAM: 82 large blocks spread across 18010.96 MB
Hooked to model 'transformer' (X0Model)
Async loading plan for model 'transformer' : base size of 221.66 MB will be preloaded with a 368.38 MB async circular shuttle
Hooked to model 'text_encoder' (Gemma3ForCausalLM)
Async loading plan for model 'text_encoder' : base size of 1920.48 MB will be preloaded with a 213.78 MB async circular shuttle
Hooked to model 'text_embedding_projection' (GemmaFeaturesExtractorProjLinear)
Hooked to model 'text_embeddings_connector' (GemmaTextEmbeddingsConnectorModel)
Hooked to model 'vae' (VideoDecoder)
Hooked to model 'video_encoder' (VideoEncoder)
Hooked to model 'audio_encoder' (AudioEncoder)
Hooked to model 'audio_decoder' (AudioDecoder)
Hooked to model 'vocoder' (Vocoder)
Hooked to model 'spatial_upsampler' (LatentUpsampler)
0%| | 0/8 [00:00<?, ?steps/s]
Traceback (most recent call last):
File "O:\Users\pabs\Desktop\Wan2GP\wgp.py", line 6224, in generate_video
samples = wan_model.generate(
File "O:\Users\pabs\Desktop\Wan2GP\models\ltx2\ltx2.py", line 967, in generate
pipeline_output = self.pipeline(
File "O:\Users\pabs\Desktop\Wan2GP\models\ltx2\ltx_pipelines\distilled.py", line 230, in call
video_state, audio_state = denoise_audio_video(
File "O:\Users\pabs\Desktop\Wan2GP\models\ltx2\ltx_pipelines\utils\helpers.py", line 814, in denoise_audio_video
video_state, audio_state = denoising_loop_fn(
File "O:\Users\pabs\Desktop\Wan2GP\models\ltx2\ltx_pipelines\distilled.py", line 173, in denoising_loop
return euler_denoising_loop(
File "O:\Users\pabs\Desktop\Wan2GP\models\ltx2\ltx_pipelines\utils\helpers.py", line 431, in euler_denoising_loop
denoised_video, denoised_audio = denoise_fn(video_state, audio_state, sigmas, step_idx)
File "O:\Users\pabs\Desktop\Wan2GP\models\ltx2\ltx_pipelines\utils\helpers.py", line 718, in simple_denoising_step
denoised_video, denoised_audio = transformer(video=pos_video, audio=pos_audio, perturbations=None)
File "O:\Users\pabs.conda\envs\wan2gp\lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "O:\Users\pabs.conda\envs\wan2gp\lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "O:\Users\pabs.conda\envs\wan2gp\lib\site-packages\mmgp\offload.py", line 2967, in check_change_module
return previous_method(*args, **kwargs)
File "O:\Users\pabs\Desktop\Wan2GP\models\ltx2\ltx_core\model\transformer\model.py", line 561, in forward
vx, ax = self.velocity_model(video, audio, perturbations)
File "O:\Users\pabs.conda\envs\wan2gp\lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "O:\Users\pabs.conda\envs\wan2gp\lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "O:\Users\pabs.conda\envs\wan2gp\lib\site-packages\mmgp\offload.py", line 2945, in check_load_into_GPU_needed_other
return previous_method(*args, **kwargs) # other
File "O:\Users\pabs\Desktop\Wan2GP\models\ltx2\ltx_core\model\transformer\model.py", line 465, in forward
video_args = self.video_args_preprocessor.prepare(video) if video is not None else None
File "O:\Users\pabs\Desktop\Wan2GP\models\ltx2\ltx_core\model\transformer\transformer_args.py", line 261, in prepare
transformer_args = self.simple_preprocessor.prepare(modality)
File "O:\Users\pabs\Desktop\Wan2GP\models\ltx2\ltx_core\model\transformer\transformer_args.py", line 186, in prepare
x = self.patchify_proj(latent)
File "O:\Users\pabs.conda\envs\wan2gp\lib\site-packages\torch\nn\modules\module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "O:\Users\pabs.conda\envs\wan2gp\lib\site-packages\torch\nn\modules\module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "O:\Users\pabs.conda\envs\wan2gp\lib\site-packages\torch\nn\modules\linear.py", line 125, in forward
return F.linear(input, self.weight, self.bias)
TypeError: linear(): argument 'weight' (position 2) must be Tensor, not NoneType

Phr00t

Owner Feb 2

I don't know if it is possible and I haven't tested it. This is meant to be used in ComfyUI, as that is the only place I'm aware it works.

sergejw

Feb 2

•

edited Feb 2

Yeah it doesnt work. But this does:
https://huggingface.co/3ndetz/LTX2-Rapid-Merges-GGUF/blob/main/nsfw/ltx2-phr00tmerge-nsfw-v62/ltx2-phr00tmerge-nsfw-v62-Q4_K_M.gguf

This is my Finetune JSON (Basically it's the new Q4KM Finetune that was added in the last updated that brought GGUF support only with phr00t's quanted model swapped in because I have no idea how any of this works.

{
"model": {
"name": "LTX-2 Distilled GGUF Q4_K_M 19B NSFW",
"architecture": "ltx2_19B",
"description": "LTX-2 distilled GGUF Q4_K_M checkpoint for llama.cpp-backed quantization.",
"URLs": [
"https://huggingface.co/3ndetz/LTX2-Rapid-Merges-GGUF/resolve/main/nsfw/ltx2-phr00tmerge-nsfw-v62/ltx2-phr00tmerge-nsfw-v62-Q4_K_M.gguf"
],
"preload_URLs": "ltx2_19B",
"ltx2_pipeline": "distilled"
},
"prompt": "A warm sunny backyard. The camera starts in a tight cinematic close-up of a woman and a man in their 30s, facing each other with serious expressions. The woman, emotional and dramatic, says softly, "That's it... Dad's lost it. And we've lost Dad."The man exhales, slightly annoyed: "Stop being so dramatic, Jess."A beat. He glances aside, then mutters defensively, "He's just having fun."The camera slowly pans right, revealing the grandfather in the garden wearing enormous butterfly wings, waving his arms in the air like he's trying to take off.He shouts, "Wheeeew!" as he flaps his wings with full commitment.The woman covers her face, on the verge of tears. The tone is deadpan, absurd, and quietly tragic.",
"num_inference_steps": 8,
"video_length": 241
}

ppppda1

Feb 3

I don't know if it is possible and I haven't tested it. This is meant to be used in ComfyUI, as that is the only place I'm aware it works.

Thanks for the reply and thanks for the original merge anyway!

ppppda1

Feb 3

Yeah it doesnt work. But this does:
https://huggingface.co/3ndetz/LTX2-Rapid-Merges-GGUF/blob/main/nsfw/ltx2-phr00tmerge-nsfw-v62/ltx2-phr00tmerge-nsfw-v62-Q4_K_M.gguf

This is my Finetune JSON (Basically it's the new Q4KM Finetune that was added in the last updated that brought GGUF support only with phr00t's quanted model swapped in because I have no idea how any of this works.

{
"model": {
"name": "LTX-2 Distilled GGUF Q4_K_M 19B NSFW",
"architecture": "ltx2_19B",
"description": "LTX-2 distilled GGUF Q4_K_M checkpoint for llama.cpp-backed quantization.",
"URLs": [
"https://huggingface.co/3ndetz/LTX2-Rapid-Merges-GGUF/resolve/main/nsfw/ltx2-phr00tmerge-nsfw-v62/ltx2-phr00tmerge-nsfw-v62-Q4_K_M.gguf"
],
"preload_URLs": "ltx2_19B",
"ltx2_pipeline": "distilled"
},
"prompt": "A warm sunny backyard. The camera starts in a tight cinematic close-up of a woman and a man in their 30s, facing each other with serious expressions. The woman, emotional and dramatic, says softly, "That's it... Dad's lost it. And we've lost Dad."The man exhales, slightly annoyed: "Stop being so dramatic, Jess."A beat. He glances aside, then mutters defensively, "He's just having fun."The camera slowly pans right, revealing the grandfather in the garden wearing enormous butterfly wings, waving his arms in the air like he's trying to take off.He shouts, "Wheeeew!" as he flaps his wings with full commitment.The woman covers her face, on the verge of tears. The tone is deadpan, absurd, and quietly tragic.",
"num_inference_steps": 8,
"video_length": 241
}

Thanks Serge.... I'll try this and report back. Are you on the wan2gp discord?

sergejw

Feb 3

Yeah it doesnt work. But this does:
https://huggingface.co/3ndetz/LTX2-Rapid-Merges-GGUF/blob/main/nsfw/ltx2-phr00tmerge-nsfw-v62/ltx2-phr00tmerge-nsfw-v62-Q4_K_M.gguf

This is my Finetune JSON (Basically it's the new Q4KM Finetune that was added in the last updated that brought GGUF support only with phr00t's quanted model swapped in because I have no idea how any of this works.

{
"model": {
"name": "LTX-2 Distilled GGUF Q4_K_M 19B NSFW",
"architecture": "ltx2_19B",
"description": "LTX-2 distilled GGUF Q4_K_M checkpoint for llama.cpp-backed quantization.",
"URLs": [
"https://huggingface.co/3ndetz/LTX2-Rapid-Merges-GGUF/resolve/main/nsfw/ltx2-phr00tmerge-nsfw-v62/ltx2-phr00tmerge-nsfw-v62-Q4_K_M.gguf"
],
"preload_URLs": "ltx2_19B",
"ltx2_pipeline": "distilled"
},
"prompt": "A warm sunny backyard. The camera starts in a tight cinematic close-up of a woman and a man in their 30s, facing each other with serious expressions. The woman, emotional and dramatic, says softly, "That's it... Dad's lost it. And we've lost Dad."The man exhales, slightly annoyed: "Stop being so dramatic, Jess."A beat. He glances aside, then mutters defensively, "He's just having fun."The camera slowly pans right, revealing the grandfather in the garden wearing enormous butterfly wings, waving his arms in the air like he's trying to take off.He shouts, "Wheeeew!" as he flaps his wings with full commitment.The woman covers her face, on the verge of tears. The tone is deadpan, absurd, and quietly tragic.",
"num_inference_steps": 8,
"video_length": 241
}

Thanks Serge.... I'll try this and report back. Are you on the wan2gp discord?

Sometimes, I am called the same there as here. I must say that sometimes I have been getting black outputs from this checkpoint when trying to start with a first frame in I2V. doesnt happen with the vanilla checkpoint. And even with LoRa and chekpoints LTX2 is really really bad at prompt adherence. So I won't be doing much with it. I was betting on this model to change this but overall I am not impressed with composition. Even though the ability to create audio is impressive.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment