Instructions to use Alissonerdx/EditAnything with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Alissonerdx/EditAnything with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Lightricks/LTX-2.3", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("Alissonerdx/EditAnything") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
File size: 8,421 Bytes
775562c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | # LoRA Layer Inventory β Edit Anything checkpoints
Inventory of every tensor in two builds of the
`edit_anything_reference_v0.1_r128` LoRA.
Both builds share the same canonical basename
(`edit_anything_reference_v0.1_r128`) and are distinguished by the **extras
suffix** that `scripts/split_editanything_lora.py` appends to the output
filenames:
- `edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.{standard,module}.safetensors`
β the original build. Only ships `ref_adaln_proj` + `role_embedding`.
- `edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.{standard,module}.safetensors`
β the continuation, fine-tuned with the
`video_to_video_ref_visual_adaln` strategy. Adds the `ref_attn` LoRA
branch and the `ref_visual_proj` projector on top of the original
extras.
In the rest of this doc the two are referred to by their suffix only:
- `ref_adaln_proj-role_embedding`
- `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj`
Rank is 128 in both (encoded in the LoRA tensor shapes; no `alpha` keys saved).
Dtype is `bfloat16` throughout. All LoRA modules cover **48 transformer blocks**.
---
## 1. Summary
| | `ref_adaln_proj-role_embedding` | `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` |
|---|---|---|
| Total tensors | 965 | 1356 |
| LoRA-target modules | **10** | **14** |
| LoRA tensors (A+B) | 960 | 1344 |
| Extra (non-LoRA) tensors | 5 | 12 |
| `ref_attn` LoRA branch | β absent | β
trained on 48 blocks |
| `ref_visual_proj` (visual cross-attn projector) | β absent | β
present (7 tensors) |
| `ref_adaln_proj` (global appearance AdaLN) | β
(fc1 input dim **256**) | β
(fc1 input dim **768**) |
| `role_embedding` | β
shape (1, 128) | β
shape (1, 128) |
---
## 2. LoRA adapters
Each row = one target module type. Each entry = (`lora_A.weight`, `lora_B.weight`)
duplicated across the 48 blocks of `diffusion_model.transformer_blocks.*`.
| Module | `ref_adaln_proj-role_embedding` | `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` | Notes |
|---|:---:|:---:|---|
| `attn1.to_q` | β
| β
| self-attention query |
| `attn1.to_k` | β
| β
| self-attention key |
| `attn1.to_v` | β
| β
| self-attention value |
| `attn1.to_out.0` | β
| β
| self-attention output proj |
| `attn2.to_q` | β
| β
| cross-attention to text (Gemma) |
| `attn2.to_k` | β
| β
| |
| `attn2.to_v` | β
| β
| |
| `attn2.to_out.0` | β
| β
| |
| `ff.net.0.proj` | β
| β
| feed-forward up-projection |
| `ff.net.2` | β
| β
| feed-forward down-projection |
| `ref_attn.to_q` | β | β
| **new** β visual reference cross-attention |
| `ref_attn.to_k` | β | β
| **new** |
| `ref_attn.to_v` | β | β
| **new** |
| `ref_attn.to_out.0` | β | β
| **new** |
**Key naming**: `diffusion_model.transformer_blocks.{0..47}.{module}.{lora_A|lora_B}.weight`
**Training freeze policy** for the
`ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` build
(per `stage2_ref_visual_adaln_crossattn_from_v01_r128.yaml`):
- `attn1.*` adapters loaded from the `ref_adaln_proj-role_embedding` build
but **frozen** (`trainable_include_patterns` excludes them).
- `attn2.*`, `ff.*`, `ref_attn.*` are trainable.
---
## 3. Non-LoRA modules (the module sidecar)
These tensors live at the top of the state dict (no `transformer_blocks.*` prefix)
and are consumed by the custom inference path (`LTXVEditAnythingModuleLoader` +
`LTXVEditAnythingLoopingSampler`), not by the standard ComfyUI LoRA loader.
### 3.1. `role_embedding` β appearance role bias
| Key | Shape | Notes |
|---|---|---|
| `role_embedding.embedding.weight` | (1, 128) | 1 slot (appearance). Padded to (3, 128) at inference; entry stored at slot 1 (ref_img role). |
Present in **both** builds with the same shape. In the
`ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` build it is
**frozen** (`use_visual_ref_role_embedding: false`); wandb shows its norm
stays flat at ~0.125 throughout training.
### 3.2. `ref_adaln_proj` β global AdaLN appearance anchor
Two-layer MLP that pools the reference latent into a vector added to every
block's AdaLN timestep bias.
| Key | `ref_adaln_proj-role_embedding` shape | `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` shape |
|---|---|---|
| `ref_adaln_proj.fc1.weight` | (512, **256**) | (512, **768**) |
| `ref_adaln_proj.fc1.bias` | (512,) | (512,) |
| `ref_adaln_proj.proj.weight` | (36864, 512) | (36864, 512) |
| `ref_adaln_proj.proj.bias` | (36864,) | (36864,) |
> β οΈ **Shape mismatch on `fc1.weight`**.
> The `ref_adaln_proj-role_embedding` build was trained with a 2-scale pool
> (`avg_1x1 β max_1x1` β 256-dim input).
> The `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` build was
> trained with a 3-scale pool (`avg_1x1 β avg_2x2 β max_1x1` β 768-dim).
> Because of this incompatibility the trainer **reinitializes**
> `ref_adaln_proj` from scratch when continuing from
> `ref_adaln_proj-role_embedding`; the AdaLN projector in the continuation
> is **not** a fine-tune of the original one. The output dim 36864 = AdaLN
> param count for the LTX-2 transformer (read at runtime via
> `preprocessor.adaln.linear.out_features`).
### 3.3. `ref_visual_proj` β visual cross-attention memory tokens
Present in `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj` only.
`SafeVisualRefProjector` (training file `video_to_video_ref_visual.py`).
Produces 32 visual memory tokens consumed by the new `ref_attn` branch.
| Key | Shape | Notes |
|---|---|---|
| `ref_visual_proj.fc1.weight` | (1024, **384**) | input 384 = 128 (local pooled) + 128 (global mean) + 128 (global std) |
| `ref_visual_proj.fc1.bias` | (1024,) | xavier init gain 0.1 |
| `ref_visual_proj.proj.weight` | (4096, 1024) | maps to context_dim 4096; xavier init gain 0.05 |
| `ref_visual_proj.proj.bias` | (4096,) | |
| `ref_visual_proj.norm.weight` | (4096,) | LayerNorm Ξ³ |
| `ref_visual_proj.norm.bias` | (4096,) | LayerNorm Ξ² |
| `ref_visual_proj.pos_embed` | (1, 32, 4096) | per-token learned positional bias |
Forward (matches `SafeVisualRefProjector.forward`):
```
tokens = local β global_mean β global_std # [B, 32, 384]
tokens = proj(silu(fc1(tokens))) # β [B, 32, 4096]
tokens = LayerNorm(tokens)
tokens = tokens + pos_embed[:, :32]
return tokens * token_scale # training default 0.25
```
Not present in `ref_adaln_proj-role_embedding` β this entire branch is new.
---
## 4. Total tensor counts (sanity check)
### `ref_adaln_proj-role_embedding`
```
LoRA: 10 modules Γ 48 blocks Γ 2 (A,B) = 960
ref_adaln_proj: 4 (fc1.{w,b}, proj.{w,b}) = 4
role_embedding: 1 = 1
total= 965 β
```
### `ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj`
```
LoRA: 14 modules Γ 48 blocks Γ 2 (A,B) = 1344
ref_adaln_proj: 4 = 4
ref_visual_proj: 7 = 7
role_embedding: 1 = 1
total= 1356 β
```
---
## 5. Loading checkpoint at inference
Use `scripts/split_editanything_lora.py` to split each raw training
checkpoint into:
- `*.standard.safetensors` β LoRA on `attn1/attn2/ff` only; safe to feed to
ComfyUI's standard LoraLoader.
- `*.module.safetensors` β everything else (`role_embedding`,
`ref_adaln_proj`, `ref_visual_proj`, `ref_attn` LoRA adapters); feed to
`LTXVEditAnythingModuleLoader`.
The filename suffix lists every extra that ended up in the module sidecar,
so it is obvious at a glance which mechanisms a given pair carries. Order is
fixed: `ref_adaln_proj`, `role_embedding`, `ref_attn`, `ref_visual_proj`.
### Canonical output names
```
edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.standard.safetensors
edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.module.safetensors
edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.standard.safetensors
edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.module.safetensors
```
### Command
```bash
python3 /data/training/ltx-edit-trainer/scripts/split_editanything_lora.py \
<raw-checkpoint>.safetensors --output-dir <dir> [--overwrite]
```
|