Instructions to use Alissonerdx/EditAnything with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Alissonerdx/EditAnything with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Lightricks/LTX-2.3", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("Alissonerdx/EditAnything") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
LoRA Layer Inventory β Edit Anything checkpoints
Inventory of every tensor in two builds of the
edit_anything_reference_v0.1_r128 LoRA.
Both builds share the same canonical basename
(edit_anything_reference_v0.1_r128) and are distinguished by the extras
suffix that scripts/split_editanything_lora.py appends to the output
filenames:
edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.{standard,module}.safetensorsβ the original build. Only shipsref_adaln_proj+role_embedding.edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.{standard,module}.safetensorsβ the continuation, fine-tuned with thevideo_to_video_ref_visual_adalnstrategy. Adds theref_attnLoRA branch and theref_visual_projprojector on top of the original extras.
In the rest of this doc the two are referred to by their suffix only:
ref_adaln_proj-role_embeddingref_adaln_proj-role_embedding-ref_attn-ref_visual_proj
Rank is 128 in both (encoded in the LoRA tensor shapes; no alpha keys saved).
Dtype is bfloat16 throughout. All LoRA modules cover 48 transformer blocks.
1. Summary
ref_adaln_proj-role_embedding |
ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj |
|
|---|---|---|
| Total tensors | 965 | 1356 |
| LoRA-target modules | 10 | 14 |
| LoRA tensors (A+B) | 960 | 1344 |
| Extra (non-LoRA) tensors | 5 | 12 |
ref_attn LoRA branch |
β absent | β trained on 48 blocks |
ref_visual_proj (visual cross-attn projector) |
β absent | β present (7 tensors) |
ref_adaln_proj (global appearance AdaLN) |
β (fc1 input dim 256) | β (fc1 input dim 768) |
role_embedding |
β shape (1, 128) | β shape (1, 128) |
2. LoRA adapters
Each row = one target module type. Each entry = (lora_A.weight, lora_B.weight)
duplicated across the 48 blocks of diffusion_model.transformer_blocks.*.
| Module | ref_adaln_proj-role_embedding |
ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj |
Notes |
|---|---|---|---|
attn1.to_q |
β | β | self-attention query |
attn1.to_k |
β | β | self-attention key |
attn1.to_v |
β | β | self-attention value |
attn1.to_out.0 |
β | β | self-attention output proj |
attn2.to_q |
β | β | cross-attention to text (Gemma) |
attn2.to_k |
β | β | |
attn2.to_v |
β | β | |
attn2.to_out.0 |
β | β | |
ff.net.0.proj |
β | β | feed-forward up-projection |
ff.net.2 |
β | β | feed-forward down-projection |
ref_attn.to_q |
β | β | new β visual reference cross-attention |
ref_attn.to_k |
β | β | new |
ref_attn.to_v |
β | β | new |
ref_attn.to_out.0 |
β | β | new |
Key naming: diffusion_model.transformer_blocks.{0..47}.{module}.{lora_A|lora_B}.weight
Training freeze policy for the
ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj build
(per stage2_ref_visual_adaln_crossattn_from_v01_r128.yaml):
attn1.*adapters loaded from theref_adaln_proj-role_embeddingbuild but frozen (trainable_include_patternsexcludes them).attn2.*,ff.*,ref_attn.*are trainable.
3. Non-LoRA modules (the module sidecar)
These tensors live at the top of the state dict (no transformer_blocks.* prefix)
and are consumed by the custom inference path (LTXVEditAnythingModuleLoader +
LTXVEditAnythingLoopingSampler), not by the standard ComfyUI LoRA loader.
3.1. role_embedding β appearance role bias
| Key | Shape | Notes |
|---|---|---|
role_embedding.embedding.weight |
(1, 128) | 1 slot (appearance). Padded to (3, 128) at inference; entry stored at slot 1 (ref_img role). |
Present in both builds with the same shape. In the
ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj build it is
frozen (use_visual_ref_role_embedding: false); wandb shows its norm
stays flat at ~0.125 throughout training.
3.2. ref_adaln_proj β global AdaLN appearance anchor
Two-layer MLP that pools the reference latent into a vector added to every block's AdaLN timestep bias.
| Key | ref_adaln_proj-role_embedding shape |
ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj shape |
|---|---|---|
ref_adaln_proj.fc1.weight |
(512, 256) | (512, 768) |
ref_adaln_proj.fc1.bias |
(512,) | (512,) |
ref_adaln_proj.proj.weight |
(36864, 512) | (36864, 512) |
ref_adaln_proj.proj.bias |
(36864,) | (36864,) |
β οΈ Shape mismatch on
fc1.weight. Theref_adaln_proj-role_embeddingbuild was trained with a 2-scale pool (avg_1x1 β max_1x1β 256-dim input). Theref_adaln_proj-role_embedding-ref_attn-ref_visual_projbuild was trained with a 3-scale pool (avg_1x1 β avg_2x2 β max_1x1β 768-dim). Because of this incompatibility the trainer reinitializesref_adaln_projfrom scratch when continuing fromref_adaln_proj-role_embedding; the AdaLN projector in the continuation is not a fine-tune of the original one. The output dim 36864 = AdaLN param count for the LTX-2 transformer (read at runtime viapreprocessor.adaln.linear.out_features).
3.3. ref_visual_proj β visual cross-attention memory tokens
Present in ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj only.
SafeVisualRefProjector (training file video_to_video_ref_visual.py).
Produces 32 visual memory tokens consumed by the new ref_attn branch.
| Key | Shape | Notes |
|---|---|---|
ref_visual_proj.fc1.weight |
(1024, 384) | input 384 = 128 (local pooled) + 128 (global mean) + 128 (global std) |
ref_visual_proj.fc1.bias |
(1024,) | xavier init gain 0.1 |
ref_visual_proj.proj.weight |
(4096, 1024) | maps to context_dim 4096; xavier init gain 0.05 |
ref_visual_proj.proj.bias |
(4096,) | |
ref_visual_proj.norm.weight |
(4096,) | LayerNorm Ξ³ |
ref_visual_proj.norm.bias |
(4096,) | LayerNorm Ξ² |
ref_visual_proj.pos_embed |
(1, 32, 4096) | per-token learned positional bias |
Forward (matches SafeVisualRefProjector.forward):
tokens = local β global_mean β global_std # [B, 32, 384]
tokens = proj(silu(fc1(tokens))) # β [B, 32, 4096]
tokens = LayerNorm(tokens)
tokens = tokens + pos_embed[:, :32]
return tokens * token_scale # training default 0.25
Not present in ref_adaln_proj-role_embedding β this entire branch is new.
4. Total tensor counts (sanity check)
ref_adaln_proj-role_embedding
LoRA: 10 modules Γ 48 blocks Γ 2 (A,B) = 960
ref_adaln_proj: 4 (fc1.{w,b}, proj.{w,b}) = 4
role_embedding: 1 = 1
total= 965 β
ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj
LoRA: 14 modules Γ 48 blocks Γ 2 (A,B) = 1344
ref_adaln_proj: 4 = 4
ref_visual_proj: 7 = 7
role_embedding: 1 = 1
total= 1356 β
5. Loading checkpoint at inference
Use scripts/split_editanything_lora.py to split each raw training
checkpoint into:
*.standard.safetensorsβ LoRA onattn1/attn2/ffonly; safe to feed to ComfyUI's standard LoraLoader.*.module.safetensorsβ everything else (role_embedding,ref_adaln_proj,ref_visual_proj,ref_attnLoRA adapters); feed toLTXVEditAnythingModuleLoader.
The filename suffix lists every extra that ended up in the module sidecar,
so it is obvious at a glance which mechanisms a given pair carries. Order is
fixed: ref_adaln_proj, role_embedding, ref_attn, ref_visual_proj.
Canonical output names
edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.standard.safetensors
edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.module.safetensors
edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.standard.safetensors
edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.module.safetensors
Command
python3 /data/training/ltx-edit-trainer/scripts/split_editanything_lora.py \
<raw-checkpoint>.safetensors --output-dir <dir> [--overwrite]