Instructions to use TenStrip/LTX2.3-10Eros with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use TenStrip/LTX2.3-10Eros with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("TenStrip/LTX2.3-10Eros", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
About Aspect ratio/resolutions and distillation steps/LORAs
The original LTX 2.3 model uses ( cit. https://huggingface.co/spaces/Lightricks/LTX-2-3-hdr/blob/main/app.py line 114-120)
"# Frames must satisfy (n-1) % 8 == 0. Aspect-ratio canvas sizes (divisible by 32).
RESOLUTIONS = {
"low": {"16:9": (768, 512), "9:16": (512, 768), "1:1": (768, 768),
"4:3": (768, 576), "3:4": (576, 768), "21:9": (768, 384)},
"high": {"16:9": (1536, 1024), "9:16": (1024, 1536), "1:1": (1024, 1024),
"4:3": (1536, 1152), "3:4": (1152, 1536), "21:9": (1536, 768)},
}".
Does this finetune/retraining use the same? Also the distilled variant of the original model uses specific sigmas for inference, are they still the same? One last question is the lora weights seems to work slightly different when applied to a GGUF version compared to BF16/FP8, is it true or am I mistaken?
All of their defaults are more like guidelines, the model can do any resolution x32 and up to 40 seconds long. Their sigmas are for a full 384 distilled base model. You get much better results in these with a lesser distilled influence and more quality sigmas like in my workflow (13 step sigmas first pass), you want at least 10 steps probably most people running 8 step workflows describe constant quality issues and the fix is simply to add ~20s of first stage sampling for night/day quality gain.
Also, some SDKs like ltx-2-mlx will round the dimension numbers to the closest lower number divisible by 32. I've yet to hit an error on the 8n+1 rule, but yeah, I think both are strongly encouraged guidelines to ensure good output.
Also, some SDKs like ltx-2-mlx will round the dimension numbers to the closest lower number divisible by 32. I've yet to hit an error on the 8n+1 rule, but yeah, I think both are strongly encouraged guidelines to ensure good output.
Frame slider in workflow is by 24 with the +1 math behind it, dimension inputs are all 32 step.