# LoRA Layer Inventory — Edit Anything checkpoints Inventory of every tensor in two builds of the `edit_anything_reference_v0.1_r128` LoRA. Both builds share the same canonical basename (`edit_anything_reference_v0.1_r128`) and are distinguished by the **extras suffix** that `scripts/split_editanything_lora.py` appends to the output filenames: - `edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.{standard,module}.safetensors` — the original build. Only ships `ref_adaln_proj` + `role_embedding`. - `edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.{standard,module}.safetensors` — the continuation, fine-tuned with the `video_to_video_ref_visual_adaln` strategy. Adds the `ref_attn` LoRA branch and the `ref_visual_proj` projector on top of the original extras. In the rest of this doc the two are referred to by their suffix only: - `ref_adaln_proj-role_embedding` - `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` Rank is 128 in both (encoded in the LoRA tensor shapes; no `alpha` keys saved). Dtype is `bfloat16` throughout. All LoRA modules cover **48 transformer blocks**. --- ## 1. Summary | | `ref_adaln_proj-role_embedding` | `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` | |---|---|---| | Total tensors | 965 | 1356 | | LoRA-target modules | **10** | **14** | | LoRA tensors (A+B) | 960 | 1344 | | Extra (non-LoRA) tensors | 5 | 12 | | `ref_attn` LoRA branch | ❌ absent | ✅ trained on 48 blocks | | `ref_visual_proj` (visual cross-attn projector) | ❌ absent | ✅ present (7 tensors) | | `ref_adaln_proj` (global appearance AdaLN) | ✅ (fc1 input dim **256**) | ✅ (fc1 input dim **768**) | | `role_embedding` | ✅ shape (1, 128) | ✅ shape (1, 128) | --- ## 2. LoRA adapters Each row = one target module type. Each entry = (`lora_A.weight`, `lora_B.weight`) duplicated across the 48 blocks of `diffusion_model.transformer_blocks.*`. | Module | `ref_adaln_proj-role_embedding` | `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` | Notes | |---|:---:|:---:|---| | `attn1.to_q` | ✅ | ✅ | self-attention query | | `attn1.to_k` | ✅ | ✅ | self-attention key | | `attn1.to_v` | ✅ | ✅ | self-attention value | | `attn1.to_out.0` | ✅ | ✅ | self-attention output proj | | `attn2.to_q` | ✅ | ✅ | cross-attention to text (Gemma) | | `attn2.to_k` | ✅ | ✅ | | | `attn2.to_v` | ✅ | ✅ | | | `attn2.to_out.0` | ✅ | ✅ | | | `ff.net.0.proj` | ✅ | ✅ | feed-forward up-projection | | `ff.net.2` | ✅ | ✅ | feed-forward down-projection | | `ref_attn.to_q` | — | ✅ | **new** — visual reference cross-attention | | `ref_attn.to_k` | — | ✅ | **new** | | `ref_attn.to_v` | — | ✅ | **new** | | `ref_attn.to_out.0` | — | ✅ | **new** | **Key naming**: `diffusion_model.transformer_blocks.{0..47}.{module}.{lora_A|lora_B}.weight` **Training freeze policy** for the `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` build (per `stage2_ref_visual_adaln_crossattn_from_v01_r128.yaml`): - `attn1.*` adapters loaded from the `ref_adaln_proj-role_embedding` build but **frozen** (`trainable_include_patterns` excludes them). - `attn2.*`, `ff.*`, `ref_attn.*` are trainable. --- ## 3. Non-LoRA modules (the module sidecar) These tensors live at the top of the state dict (no `transformer_blocks.*` prefix) and are consumed by the custom inference path (`LTXVEditAnythingModuleLoader` + `LTXVEditAnythingLoopingSampler`), not by the standard ComfyUI LoRA loader. ### 3.1. `role_embedding` — appearance role bias | Key | Shape | Notes | |---|---|---| | `role_embedding.embedding.weight` | (1, 128) | 1 slot (appearance). Padded to (3, 128) at inference; entry stored at slot 1 (ref_img role). | Present in **both** builds with the same shape. In the `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` build it is **frozen** (`use_visual_ref_role_embedding: false`); wandb shows its norm stays flat at ~0.125 throughout training. ### 3.2. `ref_adaln_proj` — global AdaLN appearance anchor Two-layer MLP that pools the reference latent into a vector added to every block's AdaLN timestep bias. | Key | `ref_adaln_proj-role_embedding` shape | `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` shape | |---|---|---| | `ref_adaln_proj.fc1.weight` | (512, **256**) | (512, **768**) | | `ref_adaln_proj.fc1.bias` | (512,) | (512,) | | `ref_adaln_proj.proj.weight` | (36864, 512) | (36864, 512) | | `ref_adaln_proj.proj.bias` | (36864,) | (36864,) | > ⚠️ **Shape mismatch on `fc1.weight`**. > The `ref_adaln_proj-role_embedding` build was trained with a 2-scale pool > (`avg_1x1 ‖ max_1x1` → 256-dim input). > The `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` build was > trained with a 3-scale pool (`avg_1x1 ‖ avg_2x2 ‖ max_1x1` → 768-dim). > Because of this incompatibility the trainer **reinitializes** > `ref_adaln_proj` from scratch when continuing from > `ref_adaln_proj-role_embedding`; the AdaLN projector in the continuation > is **not** a fine-tune of the original one. The output dim 36864 = AdaLN > param count for the LTX-2 transformer (read at runtime via > `preprocessor.adaln.linear.out_features`). ### 3.3. `ref_visual_proj` — visual cross-attention memory tokens Present in `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` only. `SafeVisualRefProjector` (training file `video_to_video_ref_visual.py`). Produces 32 visual memory tokens consumed by the new `ref_attn` branch. | Key | Shape | Notes | |---|---|---| | `ref_visual_proj.fc1.weight` | (1024, **384**) | input 384 = 128 (local pooled) + 128 (global mean) + 128 (global std) | | `ref_visual_proj.fc1.bias` | (1024,) | xavier init gain 0.1 | | `ref_visual_proj.proj.weight` | (4096, 1024) | maps to context_dim 4096; xavier init gain 0.05 | | `ref_visual_proj.proj.bias` | (4096,) | | | `ref_visual_proj.norm.weight` | (4096,) | LayerNorm γ | | `ref_visual_proj.norm.bias` | (4096,) | LayerNorm β | | `ref_visual_proj.pos_embed` | (1, 32, 4096) | per-token learned positional bias | Forward (matches `SafeVisualRefProjector.forward`): ``` tokens = local ‖ global_mean ‖ global_std # [B, 32, 384] tokens = proj(silu(fc1(tokens))) # → [B, 32, 4096] tokens = LayerNorm(tokens) tokens = tokens + pos_embed[:, :32] return tokens * token_scale # training default 0.25 ``` Not present in `ref_adaln_proj-role_embedding` — this entire branch is new. --- ## 4. Total tensor counts (sanity check) ### `ref_adaln_proj-role_embedding` ``` LoRA: 10 modules × 48 blocks × 2 (A,B) = 960 ref_adaln_proj: 4 (fc1.{w,b}, proj.{w,b}) = 4 role_embedding: 1 = 1 total= 965 ✓ ``` ### `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` ``` LoRA: 14 modules × 48 blocks × 2 (A,B) = 1344 ref_adaln_proj: 4 = 4 ref_visual_proj: 7 = 7 role_embedding: 1 = 1 total= 1356 ✓ ``` --- ## 5. Loading checkpoint at inference Use `scripts/split_editanything_lora.py` to split each raw training checkpoint into: - `*.standard.safetensors` — LoRA on `attn1/attn2/ff` only; safe to feed to ComfyUI's standard LoraLoader. - `*.module.safetensors` — everything else (`role_embedding`, `ref_adaln_proj`, `ref_visual_proj`, `ref_attn` LoRA adapters); feed to `LTXVEditAnythingModuleLoader`. The filename suffix lists every extra that ended up in the module sidecar, so it is obvious at a glance which mechanisms a given pair carries. Order is fixed: `ref_adaln_proj`, `role_embedding`, `ref_attn`, `ref_visual_proj`. ### Canonical output names ``` edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.standard.safetensors edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.module.safetensors edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.standard.safetensors edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.module.safetensors ``` ### Command ```bash python3 /data/training/ltx-edit-trainer/scripts/split_editanything_lora.py \ .safetensors --output-dir [--overwrite] ```