Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

edit_anything_30k_v1.1_motion_transfer_r128.safetensors +3 -0
edit_anything_30k_v1.1_motion_transfer_r256.safetensors +3 -0
edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.module.safetensors +3 -0
edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.standard.safetensors +3 -0
edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.module.safetensors +3 -0
edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.standard.safetensors +3 -0
lora_layers_impact.md +284 -0
lora_layers_reference.md +196 -0

edit_anything_30k_v1.1_motion_transfer_r128.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a5d01c404594cb12e69926a9ae066d01bd1115abd345e09254c391040b226471
+size 1308816336

edit_anything_30k_v1.1_motion_transfer_r256.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:407e9eed49bd5df627d68ed5eb4cfddc0353e8d133e65ad23670b4439c5faef0
+size 2617440424

edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.module.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:63ffdeed38c191108229ec3085386ac10174a0730427f86ef2c20dec4c6ea663
+size 450782608

edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.standard.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e2e51d9eafd6636c9e752300578447344925b05bb5254a405302d3a6f9c668d
+size 1308756368

edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.module.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6f9d4483480f9766528553e9f5e61f6683d315da8c037ff23ac5e825908fed7c
+size 38086368

edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.standard.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:11b69d939077ad48de24f3fbd02c7ecdfdf7db029c9dc694167e7063c61f650e
+size 1308756368

lora_layers_impact.md ADDED Viewed

	@@ -0,0 +1,284 @@

+# Functional differences between the two builds and what each layer does
+Companion to `lora_layers_reference.md`. That file is the inventory; this one
+explains the **functional role** of every group of tensors and the **expected
+behavioral impact** of toggling each branch at inference.
+Two builds of the `edit_anything_reference_v0.1_r128` LoRA exist, each
+delivered as a `(.standard, .module)` pair. The pairs are distinguished by
+their **extras suffix**:
+- `..._ref_adaln_proj-role_embedding.{standard,module}.safetensors` — the
+  original build. One mechanism for steering the model toward the reference
+  image: **global AdaLN appearance anchoring** (plus the IC-LoRA-style ref
+  tokens packed into the sequence).
+- `..._ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.{standard,module}.safetensors`
+  — the continuation. Keeps everything from the original build and adds
+  **two new mechanisms** that operate on different time/space scales.
+In the rest of this doc the two builds are referred to by their suffix only:
+- `ref_adaln_proj-role_embedding`
+- `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj`
+---
+## TL;DR
+| Branch | Where it acts | What it controls | New in `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj`? |
+|---|---|---|---|
+| `attn1` LoRA | self-attention inside every block | scene cohesion, structural editing | no (carried over, frozen) |
+| `attn2` LoRA | cross-attention to **text** (Gemma) | prompt following | no (re-trained) |
+| `ff` LoRA | feed-forward MLP | feature mixing / capacity | no (re-trained) |
+| **`ref_attn` LoRA** | dedicated cross-attention to **32 visual memory tokens** | preserving fine-grained appearance of the reference | **yes** |
+| **`ref_visual_proj`** | projects the ref VAE latent into 32 context tokens | the *content* that `ref_attn` attends to | **yes** |
+| `ref_adaln_proj` | produces a global vector added to the timestep AdaLN | overall color/style/identity bias | retrained (new pooling) |
+| `role_embedding` | adds a 128-dim bias to ref tokens in the IC-LoRA sequence | tells the transformer "this token is the reference" | frozen in the continuation |
+So:
+- `ref_adaln_proj-role_embedding` only had a **slow, global** appearance signal
+  (AdaLN) plus the IC-LoRA-style ref tokens packed into the sequence.
+- `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` adds a **fast,
+  local** appearance signal (visual cross-attention) that injects the
+  reference's actual textures into every block in the 12 → 35 range.
+---
+## 1. The 10 modules shared between both builds
+These cover the full 48-block transformer and were retrained in
+`ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` (except `attn1.*`,
+which is loaded but frozen — see the training freeze policy in the inventory).
+### `attn1.{q,k,v,out.0}` — self-attention
+Every transformer block first does self-attention over the latent video
+tokens. The LoRA here adjusts how tokens relate to each other:
+- **structural consistency** of the generated frames,
+- **how strongly the `@reference` IC-LoRA token influences neighboring
+  spatial positions**,
+- low-level look (sharpness, contrast).
+In the `..._ref_attn-ref_visual_proj` build these are frozen on purpose so
+the original priors over motion and structure stay intact. If the inference
+output looks structurally broken (jitter, motion drift, layout collapse),
+you probably misloaded these adapters or the standard LoRA is at the wrong
+strength.
+### `attn2.{q,k,v,out.0}` — cross-attention to text
+This is the prompt-following path. The Gemma text embedding is the K/V; the
+video latent is the Q. The LoRA tunes how the prompt drives the edit.
+- Stronger `attn2` deltas ⇒ the model **leans more on the prompt** ("Add
+  @reference sleeping on the armrest"). Useful for compositional control.
+- If you disable or weaken the standard LoRA (e.g. `strength_model=0`), the
+  base model goes back to ignoring your edit instructions — even if `ref_attn`
+  is still active, the prompt-binding is gone.
+### `ff.net.{0.proj, 2}` — MLP capacity
+The block's feed-forward part. The LoRA here adds **representational
+capacity** to absorb the new behaviors that prompt + reference impose. There
+is no single user-visible "knob" for this; it works behind the scenes.
+If you slash its strength you'll see colors and textures drift back toward
+generic LTX-2 outputs.
+---
+## 2. The new `ref_attn` branch
+This is the heart of the change in
+`ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj`. Each of the 48
+transformer blocks now has a *fourth* attention head, `ref_attn`, in
+addition to `attn1` (self) and `attn2` (text). `ref_attn` cross-attends from
+the noisy video latent (Q) to **a small set of visual memory tokens
+computed from the reference image** (K/V).
+### Why three projections (q/k/v/out.0)
+A standard cross-attention. The base weights are copied from `attn2` at load
+time (`init_ref_attn_from: attn2`) so the module starts as "text cross-attn,
+but pointed at visual tokens"; the LoRA then teaches it to actually
+*use* those visual tokens.
+### Per-block gating
+`ref_attn` is only consulted in blocks **12 → 35** (this is what
+`ref_start_block` / `ref_end_block` enforce at inference and what the trainer
+used during fine-tuning). Skipping blocks 0–11 keeps the early low-level
+features untouched; skipping blocks 36–47 lets the late decoding stages do
+their job without extra visual bias.
+### Impact
+- **Strong identity preservation** for things the AdaLN anchor can't capture
+  (small logos, eye color, fur texture, asymmetric details).
+- Scaled by `ref_context_scale` (training default `0.01`). Small for a
+  reason: the visual tokens are dense, and the residual is added on top of
+  every block in the 12–35 range — even at 0.01 the cumulative effect is
+  meaningful.
+- Doubling the scale (→ 0.02) usually intensifies identity at the cost of
+  motion fidelity; going to 0.05+ tends to "freeze" parts of the scene to the
+  reference appearance.
+- Setting `ref_start_block=0` is **destructive**: blocks 0–11 never saw
+  `ref_context` during training, so injecting it there feeds the model
+  noise — outputs collapse to black or random patterns.
+---
+## 3. The new `ref_visual_proj`
+This is the *source* of what `ref_attn` attends to. Without it the
+`ref_attn` LoRA is useless — there are no visual tokens to read.
+### Forward
+```
+ref_frame  = mean over time of the ref VAE latent       # [B, 128, H, W]
+local      = adaptive_avg_pool to (4, 8)                 # 32 spatial cells
+global_mean, global_std over the whole frame             # 2 × 128
+tokens     = concat(local, broadcast(mean,std))          # [B, 32, 384]
+tokens     = proj(silu(fc1(tokens)))                     # [B, 32, 4096]
+tokens     = LayerNorm(tokens)
+tokens     = tokens + pos_embed[:, :32]
+return tokens * token_scale                              # 0.25 in training
+```
+### Layer-by-layer impact
+| Tensor | What it controls | If perturbed |
+|---|---|---|
+| `fc1.weight / bias` (1024×384) | maps the 384-dim raw appearance descriptor into the projector's hidden space | weights here decide *which* aspects of the pooled appearance survive (e.g. color vs. texture vs. luminance) |
+| `proj.weight / bias` (4096×1024) | lifts the hidden vector into the transformer context dim | initialized with small gain (0.05) so the branch starts almost-no-op; loaded from training |
+| `norm.weight / bias` (4096) | LayerNorm on the projected tokens | keeps numerical range consistent across reference images so `ref_attn` works at the same scale regardless of input statistics |
+| `pos_embed` (1, 32, 4096) | per-position bias for the 32 memory tokens | the model uses this to distinguish "top-left cell" from "bottom-right cell" — without it, all 32 tokens would be permutation-invariant and `ref_attn` would degenerate |
+### `ref_token_scale` (training = 0.25)
+This is the runtime multiplier on the output. It is **not** a stored tensor
+but a knob in the inference node. Doubling it (→ 0.5) effectively doubles
+the K/V magnitude that `ref_attn` reads, which biases attention scores
+toward the reference tokens. Combined with `ref_context_scale`, you have
+two independent ways to over-/under-amplify the visual reference branch.
+---
+## 4. `ref_adaln_proj` — *retrained, not continued*
+Both builds have this projector, but **the input dimension changed**:
+| | `ref_adaln_proj-role_embedding` | `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` |
+|---|---|---|
+| Pooling | `avg_1x1 ‖ max_1x1` (2-scale) | `avg_1x1 ‖ avg_2x2 ‖ max_1x1` (3-scale) |
+| `fc1.weight` shape | (512, **256**) | (512, **768**) |
+Because of the shape mismatch the trainer **reinitializes** `ref_adaln_proj`
+from scratch when continuing from `ref_adaln_proj-role_embedding`. The
+`ref_adaln_proj` in the continuation is not a fine-tune of the original — it
+learned fresh. wandb confirms this: `ref_proj/weight_norm` ramps from
+near-zero to ~2.9.
+### What it actually does
+Builds one **per-sample** vector that is **added to the timestep bias** fed
+into every transformer block's AdaLN layer. The result: a persistent,
+sample-wide "lean toward this reference" applied throughout denoising.
+### Why this is the *complement* of `ref_attn`
+- `ref_attn` is **localized**: visual tokens cross-attend per spatial cell,
+  letting the model copy fine details.
+- `ref_adaln_proj` is **global**: a single conditioning vector tints all 48
+  blocks uniformly. Best for "the overall look of the output should remind
+  me of this reference" (palette, lighting, broad style).
+### `adaln_scale` (training = 2.0)
+The user-side multiplier. At training default 2.0, AdaLN is doing a lot of
+the appearance lifting. Common failure modes:
+- **`adaln_scale=0`**: model ignores the reference's global look; you keep
+  only what `ref_attn` and the IC-LoRA tokens can recover. Expect washed-out
+  identity.
+- **`adaln_scale=1.0`** (ComfyUI default before the recent realignment):
+  exactly half the training-time strength. Identity is still recognizable
+  but visibly weaker.
+- **`adaln_scale>3`**: identity dominates and the model starts ignoring the
+  prompt / guide motion.
+---
+## 5. `role_embedding` — present in both, behavior depends on which you load
+A learned `[1, 128]` vector that **adds a fingerprint** to the patchified
+tokens belonging to the IC-LoRA reference image, so the transformer can tell
+the ref token apart from generic guide / target tokens.
+### In `ref_adaln_proj-role_embedding`
+Was trained with `use_visual_ref_role_embedding=True` — that's where the
+non-zero value (~0.125 norm) comes from. The `attn1`/`attn2` adapters in
+this build therefore learned to *recognize* this bias.
+### In `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj`
+Inherits the value from `ref_adaln_proj-role_embedding` but trains with
+`use_visual_ref_role_embedding=False`, meaning the bias **is never added
+during training**. The vector is frozen at its inherited value; wandb shows
+its norm flat at 0.125 across the whole run.
+### Inference rule
+When loading `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj`: keep
+**`enable_role_embedding=False`**. Turning it on adds a bias to the ref
+tokens that this build never saw — the `attn1`/`attn2` adapters retrained
+without it, so the bias becomes adversarial noise and degrades the output.
+When loading `ref_adaln_proj-role_embedding` directly (no
+`..._ref_attn-ref_visual_proj` adapters), the opposite is true:
+`enable_role_embedding=True` matches the training distribution.
+---
+## 6. Quick reference: what each knob does at inference
+| Knob | `..._ref_attn-ref_visual_proj` training value | Effect of raising it | Effect of lowering it |
+|---|---|---|---|
+| `adaln_scale` | 2.0 | stronger global look | identity fades |
+| `ref_context_scale` | 0.01 | sharper fine-grained ID; can over-freeze | local detail blurs back to base |
+| `ref_token_scale` | 0.25 | more "voice" for the visual tokens in attention | `ref_attn` becomes a no-op |
+| `ref_start_block` / `ref_end_block` | 12 / 35 | (do not change) | (do not change) — outside this range the LoRA is untrained |
+| `enable_role_embedding` | False | adds out-of-distribution bias to ref tokens | matches training |
+| `role_strength` | n/a | only matters if `enable_role_embedding=True` | |
+| Standard LoRA `strength_model` | 1.0 | over-fits to training distribution | drifts back toward base LTX-2 |
+The combination that mirrors training of the
+`..._ref_attn-ref_visual_proj` build exactly: `adaln_scale=2.0,
+ref_context_scale=0.01, ref_token_scale=0.25, ref_start_block=12,
+ref_end_block=35, enable_role_embedding=False, ref_init_from=attn2,
+strength_model=1.0`.
+---
+## 7. Where the loaded files come from
+`scripts/split_editanything_lora.py` produces two safetensors per checkpoint.
+The filename suffix lists every extra that ended up in the module sidecar
+(fixed order: `ref_adaln_proj`, `role_embedding`, `ref_attn`,
+`ref_visual_proj`), so you can tell which mechanisms each pair carries
+without opening the file.
+Canonical pairs:
+```
+edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.standard.safetensors
+edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.module.safetensors
+edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.standard.safetensors
+edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.module.safetensors
+```
+Feed the `.standard.*` into ComfyUI's standard LoRA loader and the
+`.module.*` into `LTXVEditAnythingModuleLoader`. Mixing pairs across builds
+(e.g., `ref_adaln_proj-role_embedding.standard.*` with
+`..._ref_attn-ref_visual_proj.module.*`) is not supported — the LoRA deltas
+were trained against the partner adapters in the same build.

lora_layers_reference.md ADDED Viewed

	@@ -0,0 +1,196 @@

+# LoRA Layer Inventory — Edit Anything checkpoints
+Inventory of every tensor in two builds of the
+`edit_anything_reference_v0.1_r128` LoRA.
+Both builds share the same canonical basename
+(`edit_anything_reference_v0.1_r128`) and are distinguished by the **extras
+suffix** that `scripts/split_editanything_lora.py` appends to the output
+filenames:
+- `edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.{standard,module}.safetensors`
+  — the original build. Only ships `ref_adaln_proj` + `role_embedding`.
+- `edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.{standard,module}.safetensors`
+  — the continuation, fine-tuned with the
+    `video_to_video_ref_visual_adaln` strategy. Adds the `ref_attn` LoRA
+    branch and the `ref_visual_proj` projector on top of the original
+    extras.
+In the rest of this doc the two are referred to by their suffix only:
+- `ref_adaln_proj-role_embedding`
+- `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj`
+Rank is 128 in both (encoded in the LoRA tensor shapes; no `alpha` keys saved).
+Dtype is `bfloat16` throughout. All LoRA modules cover **48 transformer blocks**.
+---
+## 1. Summary
+| | `ref_adaln_proj-role_embedding` | `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` |
+|---|---|---|
+| Total tensors | 965 | 1356 |
+| LoRA-target modules | **10** | **14** |
+| LoRA tensors (A+B) | 960 | 1344 |
+| Extra (non-LoRA) tensors | 5 | 12 |
+| `ref_attn` LoRA branch | ❌ absent | ✅ trained on 48 blocks |
+| `ref_visual_proj` (visual cross-attn projector) | ❌ absent | ✅ present (7 tensors) |
+| `ref_adaln_proj` (global appearance AdaLN) | ✅ (fc1 input dim **256**) | ✅ (fc1 input dim **768**) |
+| `role_embedding` | ✅ shape (1, 128) | ✅ shape (1, 128) |
+---
+## 2. LoRA adapters
+Each row = one target module type. Each entry = (`lora_A.weight`, `lora_B.weight`)
+duplicated across the 48 blocks of `diffusion_model.transformer_blocks.*`.
+| Module | `ref_adaln_proj-role_embedding` | `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` | Notes |
+|---|:---:|:---:|---|
+| `attn1.to_q` | ✅ | ✅ | self-attention query |
+| `attn1.to_k` | ✅ | ✅ | self-attention key |
+| `attn1.to_v` | ✅ | ✅ | self-attention value |
+| `attn1.to_out.0` | ✅ | ✅ | self-attention output proj |
+| `attn2.to_q` | ✅ | ✅ | cross-attention to text (Gemma) |
+| `attn2.to_k` | ✅ | ✅ | |
+| `attn2.to_v` | ✅ | ✅ | |
+| `attn2.to_out.0` | ✅ | ✅ | |
+| `ff.net.0.proj` | ✅ | ✅ | feed-forward up-projection |
+| `ff.net.2` | ✅ | ✅ | feed-forward down-projection |
+| `ref_attn.to_q` | — | ✅ | **new** — visual reference cross-attention |
+| `ref_attn.to_k` | — | ✅ | **new** |
+| `ref_attn.to_v` | — | ✅ | **new** |
+| `ref_attn.to_out.0` | — | ✅ | **new** |
+**Key naming**: `diffusion_model.transformer_blocks.{0..47}.{module}.{lora_A|lora_B}.weight`
+**Training freeze policy** for the
+`ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` build
+(per `stage2_ref_visual_adaln_crossattn_from_v01_r128.yaml`):
+- `attn1.*` adapters loaded from the `ref_adaln_proj-role_embedding` build
+  but **frozen** (`trainable_include_patterns` excludes them).
+- `attn2.*`, `ff.*`, `ref_attn.*` are trainable.
+---
+## 3. Non-LoRA modules (the module sidecar)
+These tensors live at the top of the state dict (no `transformer_blocks.*` prefix)
+and are consumed by the custom inference path (`LTXVEditAnythingModuleLoader` +
+`LTXVEditAnythingLoopingSampler`), not by the standard ComfyUI LoRA loader.
+### 3.1. `role_embedding` — appearance role bias
+| Key | Shape | Notes |
+|---|---|---|
+| `role_embedding.embedding.weight` | (1, 128) | 1 slot (appearance). Padded to (3, 128) at inference; entry stored at slot 1 (ref_img role). |
+Present in **both** builds with the same shape. In the
+`ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` build it is
+**frozen** (`use_visual_ref_role_embedding: false`); wandb shows its norm
+stays flat at ~0.125 throughout training.
+### 3.2. `ref_adaln_proj` — global AdaLN appearance anchor
+Two-layer MLP that pools the reference latent into a vector added to every
+block's AdaLN timestep bias.
+| Key | `ref_adaln_proj-role_embedding` shape | `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` shape |
+|---|---|---|
+| `ref_adaln_proj.fc1.weight` | (512, **256**) | (512, **768**) |
+| `ref_adaln_proj.fc1.bias` | (512,) | (512,) |
+| `ref_adaln_proj.proj.weight` | (36864, 512) | (36864, 512) |
+| `ref_adaln_proj.proj.bias` | (36864,) | (36864,) |
+> ⚠️ **Shape mismatch on `fc1.weight`**.
+> The `ref_adaln_proj-role_embedding` build was trained with a 2-scale pool
+> (`avg_1x1 ‖ max_1x1` → 256-dim input).
+> The `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` build was
+> trained with a 3-scale pool (`avg_1x1 ‖ avg_2x2 ‖ max_1x1` → 768-dim).
+> Because of this incompatibility the trainer **reinitializes**
+> `ref_adaln_proj` from scratch when continuing from
+> `ref_adaln_proj-role_embedding`; the AdaLN projector in the continuation
+> is **not** a fine-tune of the original one. The output dim 36864 = AdaLN
+> param count for the LTX-2 transformer (read at runtime via
+> `preprocessor.adaln.linear.out_features`).
+### 3.3. `ref_visual_proj` — visual cross-attention memory tokens
+Present in `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` only.
+`SafeVisualRefProjector` (training file `video_to_video_ref_visual.py`).
+Produces 32 visual memory tokens consumed by the new `ref_attn` branch.
+| Key | Shape | Notes |
+|---|---|---|
+| `ref_visual_proj.fc1.weight` | (1024, **384**) | input 384 = 128 (local pooled) + 128 (global mean) + 128 (global std) |
+| `ref_visual_proj.fc1.bias` | (1024,) | xavier init gain 0.1 |
+| `ref_visual_proj.proj.weight` | (4096, 1024) | maps to context_dim 4096; xavier init gain 0.05 |
+| `ref_visual_proj.proj.bias` | (4096,) | |
+| `ref_visual_proj.norm.weight` | (4096,) | LayerNorm γ |
+| `ref_visual_proj.norm.bias` | (4096,) | LayerNorm β |
+| `ref_visual_proj.pos_embed` | (1, 32, 4096) | per-token learned positional bias |
+Forward (matches `SafeVisualRefProjector.forward`):
+```
+tokens = local ‖ global_mean ‖ global_std          # [B, 32, 384]
+tokens = proj(silu(fc1(tokens)))                   # → [B, 32, 4096]
+tokens = LayerNorm(tokens)
+tokens = tokens + pos_embed[:, :32]
+return tokens * token_scale                        # training default 0.25
+```
+Not present in `ref_adaln_proj-role_embedding` — this entire branch is new.
+---
+## 4. Total tensor counts (sanity check)
+### `ref_adaln_proj-role_embedding`
+```
+LoRA: 10 modules × 48 blocks × 2 (A,B)            = 960
+ref_adaln_proj: 4 (fc1.{w,b}, proj.{w,b})         =   4
+role_embedding: 1                                 =   1
+                                              total= 965 ✓
+```
+### `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj`
+```
+LoRA: 14 modules × 48 blocks × 2 (A,B)            = 1344
+ref_adaln_proj: 4                                 =    4
+ref_visual_proj: 7                                =    7
+role_embedding: 1                                 =    1
+                                              total= 1356 ✓
+```
+---
+## 5. Loading checkpoint at inference
+Use `scripts/split_editanything_lora.py` to split each raw training
+checkpoint into:
+- `*.standard.safetensors` — LoRA on `attn1/attn2/ff` only; safe to feed to
+  ComfyUI's standard LoraLoader.
+- `*.module.safetensors` — everything else (`role_embedding`,
+  `ref_adaln_proj`, `ref_visual_proj`, `ref_attn` LoRA adapters); feed to
+  `LTXVEditAnythingModuleLoader`.
+The filename suffix lists every extra that ended up in the module sidecar,
+so it is obvious at a glance which mechanisms a given pair carries. Order is
+fixed: `ref_adaln_proj`, `role_embedding`, `ref_attn`, `ref_visual_proj`.
+### Canonical output names
+```
+edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.standard.safetensors
+edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.module.safetensors
+edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.standard.safetensors
+edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.module.safetensors
+```
+### Command
+```bash
+python3 /data/training/ltx-edit-trainer/scripts/split_editanything_lora.py \
+  <raw-checkpoint>.safetensors --output-dir <dir> [--overwrite]
+```