Instructions to use Alissonerdx/EditAnything with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Alissonerdx/EditAnything with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Lightricks/LTX-2.3", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("Alissonerdx/EditAnything") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
| # LoRA Layer Inventory β Edit Anything checkpoints | |
| Inventory of every tensor in two builds of the | |
| `edit_anything_reference_v0.1_r128` LoRA. | |
| Both builds share the same canonical basename | |
| (`edit_anything_reference_v0.1_r128`) and are distinguished by the **extras | |
| suffix** that `scripts/split_editanything_lora.py` appends to the output | |
| filenames: | |
| - `edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.{standard,module}.safetensors` | |
| β the original build. Only ships `ref_adaln_proj` + `role_embedding`. | |
| - `edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.{standard,module}.safetensors` | |
| β the continuation, fine-tuned with the | |
| `video_to_video_ref_visual_adaln` strategy. Adds the `ref_attn` LoRA | |
| branch and the `ref_visual_proj` projector on top of the original | |
| extras. | |
| In the rest of this doc the two are referred to by their suffix only: | |
| - `ref_adaln_proj-role_embedding` | |
| - `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` | |
| Rank is 128 in both (encoded in the LoRA tensor shapes; no `alpha` keys saved). | |
| Dtype is `bfloat16` throughout. All LoRA modules cover **48 transformer blocks**. | |
| --- | |
| ## 1. Summary | |
| | | `ref_adaln_proj-role_embedding` | `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` | | |
| |---|---|---| | |
| | Total tensors | 965 | 1356 | | |
| | LoRA-target modules | **10** | **14** | | |
| | LoRA tensors (A+B) | 960 | 1344 | | |
| | Extra (non-LoRA) tensors | 5 | 12 | | |
| | `ref_attn` LoRA branch | β absent | β trained on 48 blocks | | |
| | `ref_visual_proj` (visual cross-attn projector) | β absent | β present (7 tensors) | | |
| | `ref_adaln_proj` (global appearance AdaLN) | β (fc1 input dim **256**) | β (fc1 input dim **768**) | | |
| | `role_embedding` | β shape (1, 128) | β shape (1, 128) | | |
| --- | |
| ## 2. LoRA adapters | |
| Each row = one target module type. Each entry = (`lora_A.weight`, `lora_B.weight`) | |
| duplicated across the 48 blocks of `diffusion_model.transformer_blocks.*`. | |
| | Module | `ref_adaln_proj-role_embedding` | `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` | Notes | | |
| |---|:---:|:---:|---| | |
| | `attn1.to_q` | β | β | self-attention query | | |
| | `attn1.to_k` | β | β | self-attention key | | |
| | `attn1.to_v` | β | β | self-attention value | | |
| | `attn1.to_out.0` | β | β | self-attention output proj | | |
| | `attn2.to_q` | β | β | cross-attention to text (Gemma) | | |
| | `attn2.to_k` | β | β | | | |
| | `attn2.to_v` | β | β | | | |
| | `attn2.to_out.0` | β | β | | | |
| | `ff.net.0.proj` | β | β | feed-forward up-projection | | |
| | `ff.net.2` | β | β | feed-forward down-projection | | |
| | `ref_attn.to_q` | β | β | **new** β visual reference cross-attention | | |
| | `ref_attn.to_k` | β | β | **new** | | |
| | `ref_attn.to_v` | β | β | **new** | | |
| | `ref_attn.to_out.0` | β | β | **new** | | |
| **Key naming**: `diffusion_model.transformer_blocks.{0..47}.{module}.{lora_A|lora_B}.weight` | |
| **Training freeze policy** for the | |
| `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` build | |
| (per `stage2_ref_visual_adaln_crossattn_from_v01_r128.yaml`): | |
| - `attn1.*` adapters loaded from the `ref_adaln_proj-role_embedding` build | |
| but **frozen** (`trainable_include_patterns` excludes them). | |
| - `attn2.*`, `ff.*`, `ref_attn.*` are trainable. | |
| --- | |
| ## 3. Non-LoRA modules (the module sidecar) | |
| These tensors live at the top of the state dict (no `transformer_blocks.*` prefix) | |
| and are consumed by the custom inference path (`LTXVEditAnythingModuleLoader` + | |
| `LTXVEditAnythingLoopingSampler`), not by the standard ComfyUI LoRA loader. | |
| ### 3.1. `role_embedding` β appearance role bias | |
| | Key | Shape | Notes | | |
| |---|---|---| | |
| | `role_embedding.embedding.weight` | (1, 128) | 1 slot (appearance). Padded to (3, 128) at inference; entry stored at slot 1 (ref_img role). | | |
| Present in **both** builds with the same shape. In the | |
| `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` build it is | |
| **frozen** (`use_visual_ref_role_embedding: false`); wandb shows its norm | |
| stays flat at ~0.125 throughout training. | |
| ### 3.2. `ref_adaln_proj` β global AdaLN appearance anchor | |
| Two-layer MLP that pools the reference latent into a vector added to every | |
| block's AdaLN timestep bias. | |
| | Key | `ref_adaln_proj-role_embedding` shape | `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` shape | | |
| |---|---|---| | |
| | `ref_adaln_proj.fc1.weight` | (512, **256**) | (512, **768**) | | |
| | `ref_adaln_proj.fc1.bias` | (512,) | (512,) | | |
| | `ref_adaln_proj.proj.weight` | (36864, 512) | (36864, 512) | | |
| | `ref_adaln_proj.proj.bias` | (36864,) | (36864,) | | |
| > β οΈ **Shape mismatch on `fc1.weight`**. | |
| > The `ref_adaln_proj-role_embedding` build was trained with a 2-scale pool | |
| > (`avg_1x1 β max_1x1` β 256-dim input). | |
| > The `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` build was | |
| > trained with a 3-scale pool (`avg_1x1 β avg_2x2 β max_1x1` β 768-dim). | |
| > Because of this incompatibility the trainer **reinitializes** | |
| > `ref_adaln_proj` from scratch when continuing from | |
| > `ref_adaln_proj-role_embedding`; the AdaLN projector in the continuation | |
| > is **not** a fine-tune of the original one. The output dim 36864 = AdaLN | |
| > param count for the LTX-2 transformer (read at runtime via | |
| > `preprocessor.adaln.linear.out_features`). | |
| ### 3.3. `ref_visual_proj` β visual cross-attention memory tokens | |
| Present in `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` only. | |
| `SafeVisualRefProjector` (training file `video_to_video_ref_visual.py`). | |
| Produces 32 visual memory tokens consumed by the new `ref_attn` branch. | |
| | Key | Shape | Notes | | |
| |---|---|---| | |
| | `ref_visual_proj.fc1.weight` | (1024, **384**) | input 384 = 128 (local pooled) + 128 (global mean) + 128 (global std) | | |
| | `ref_visual_proj.fc1.bias` | (1024,) | xavier init gain 0.1 | | |
| | `ref_visual_proj.proj.weight` | (4096, 1024) | maps to context_dim 4096; xavier init gain 0.05 | | |
| | `ref_visual_proj.proj.bias` | (4096,) | | | |
| | `ref_visual_proj.norm.weight` | (4096,) | LayerNorm Ξ³ | | |
| | `ref_visual_proj.norm.bias` | (4096,) | LayerNorm Ξ² | | |
| | `ref_visual_proj.pos_embed` | (1, 32, 4096) | per-token learned positional bias | | |
| Forward (matches `SafeVisualRefProjector.forward`): | |
| ``` | |
| tokens = local β global_mean β global_std # [B, 32, 384] | |
| tokens = proj(silu(fc1(tokens))) # β [B, 32, 4096] | |
| tokens = LayerNorm(tokens) | |
| tokens = tokens + pos_embed[:, :32] | |
| return tokens * token_scale # training default 0.25 | |
| ``` | |
| Not present in `ref_adaln_proj-role_embedding` β this entire branch is new. | |
| --- | |
| ## 4. Total tensor counts (sanity check) | |
| ### `ref_adaln_proj-role_embedding` | |
| ``` | |
| LoRA: 10 modules Γ 48 blocks Γ 2 (A,B) = 960 | |
| ref_adaln_proj: 4 (fc1.{w,b}, proj.{w,b}) = 4 | |
| role_embedding: 1 = 1 | |
| total= 965 β | |
| ``` | |
| ### `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` | |
| ``` | |
| LoRA: 14 modules Γ 48 blocks Γ 2 (A,B) = 1344 | |
| ref_adaln_proj: 4 = 4 | |
| ref_visual_proj: 7 = 7 | |
| role_embedding: 1 = 1 | |
| total= 1356 β | |
| ``` | |
| --- | |
| ## 5. Loading checkpoint at inference | |
| Use `scripts/split_editanything_lora.py` to split each raw training | |
| checkpoint into: | |
| - `*.standard.safetensors` β LoRA on `attn1/attn2/ff` only; safe to feed to | |
| ComfyUI's standard LoraLoader. | |
| - `*.module.safetensors` β everything else (`role_embedding`, | |
| `ref_adaln_proj`, `ref_visual_proj`, `ref_attn` LoRA adapters); feed to | |
| `LTXVEditAnythingModuleLoader`. | |
| The filename suffix lists every extra that ended up in the module sidecar, | |
| so it is obvious at a glance which mechanisms a given pair carries. Order is | |
| fixed: `ref_adaln_proj`, `role_embedding`, `ref_attn`, `ref_visual_proj`. | |
| ### Canonical output names | |
| ``` | |
| edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.standard.safetensors | |
| edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.module.safetensors | |
| edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.standard.safetensors | |
| edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.module.safetensors | |
| ``` | |
| ### Command | |
| ```bash | |
| python3 /data/training/ltx-edit-trainer/scripts/split_editanything_lora.py \ | |
| <raw-checkpoint>.safetensors --output-dir <dir> [--overwrite] | |
| ``` | |