Incredible model much better than Veo 3 or even Sora 2
Thank you for open-sourcing LTX-2 — this is an extraordinary piece of work. I’ve been testing it in some workflows, and the temporal consistency + speed are genuinely impressive even with minimal setup. It’s rare to see this level of quality and practicality in an open model.
Huge thanks to the team for the contribution to the community — please keep up the amazing work!
please share 🤯 (outputs)
please share 🤯 (outputs)
ok here is a 10 sec vid generated on 5090, and 96 Gb RAM system, 10 secs 1440x816 res, wf in video. sageattn enabled, gemma optimised https://huggingface.co/unsloth/gemma-3-12b-it-bnb-4bit block modification in gemma_encoder.py line 493 ff. ef ltxv_gemma_clip(encoder_path, ltxv_path, processor=None, dtype=None):
class _LTXVGemmaTextEncoderModel(LTXVGemmaTextEncoderModel):
def init(self, device="cpu", dtype=dtype, model_options={}):
dtype = torch.bfloat16 # TODO: make this configurable
gemma_model = Gemma3ForConditionalGeneration.from_pretrained(
encoder_path,
local_files_only=True,
torch_dtype=dtype,
device_map={"": "cpu"},
)
to avoid ooms at 2nd runs
Thank you so much for the kind words!
We’re excited to keep pushing this forward with the community’s support 🚀
yes, a single person doing simple things, it's good. Anything more complex but not crazy, it feels much more messy than Wan (2.2!).
You really have to hold its by the hands in prompt baby steps. and even then it often struggles , when you push the length (not 20seconds, just 200 or so frames. (physics, continuity freaking out, wild jumps, switching to different images, losing consistency of a person from a start image is almost always happening...) and the world and physics (motion) understanding in general has ups and downs.
And the recommendation of loras for basic camera motions it should just know, hints at maybe a few too many compromises made for the incredibly impressive speed.
we'll see, where it goes from here. After all, Wan2.1 had these morphing and freaking out issues). And Hunyuan Video 1.5 certainly does (while being very slow).
My enthusiasm from realizing how fast it is has cooled down a lot.