Alissonerdx
/

EditAnything

 ---
 license: apache-2.0
+library_name: diffusers
+base_model: Lightricks/LTX-2.3
+tags:
+  - lora
+  - video
+  - video-editing
+  - ltxv
+  - ltx-2.3
 ---
+# Edit Anything — Experimental LTX-2 Video Editing LoRAs
+> **Heads up.** These LoRAs are research experiments. They are far from
+> production-ready and will fail on many inputs. They are released for the
+> community to play with and break, not as a finished tool.
+This repository hosts two unrelated training tracks built on top of
+**LTX-2.3 (22B)** for video editing:
+1. **Edit Anything v1.1 — motion transfer LoRA** (two ranks).
+2. **Reference video-to-video (Ref V2V) — experimental IC-LoRA + sidecar modules** (two builds).
+Inference is meant to run through the **BFSnodes** ComfyUI custom nodes —
+the Ref V2V build in particular needs them to load the sidecar modules and
+install the custom branches into the transformer.
+---
+## 1. Edit Anything v1.1 (motion transfer)
+Files:
+- `edit_anything_30k_v1.1_motion_transfer_r128.safetensors`
+- `edit_anything_30k_v1.1_motion_transfer_r256.safetensors`
+### What it is
+**v1.1 is not a direct continuation of v1.0.** It was trained from scratch
+in two stages:
+1. **Stage 1 — image-only pretraining.** ~30 000 image edit pairs. Training
+   a *video* model on still images is admittedly not ideal, but it was a way
+   to push the editing vocabulary beyond what a small video-only dataset can
+   teach.
+2. **Stage 2 — video fine-tune with `first_frame_conditioning > 0`.** This
+   restored the temporal prior and unlocked the motion-transfer behaviour
+   described below.
+In theory v1.1 can do the same edits as v1.0, but **temporal consistency may
+be weaker than v1.0** because so much of stage 1 happened on still images.
+Test against v1.0 case-by-case before assuming v1.1 wins on your task.
+### Motion transfer
+Because stage 2 included first-frame conditioning, you can drive the LoRA
+into a motion-transfer mode:
+1. Take a guide video.
+2. **Replace its first frame** with an edited still (insert a new subject,
+   swap an object, etc.). Use a strong image-editing model — Flux Kontext /
+   "Klein" or similar — to prepare it; the quality of this single frame
+   propagates through the whole clip.
+3. Feed the edited frame as the first frame of the input, and the original
+   guide video as the motion source.
+The model uses the new first frame as the appearance anchor and copies the
+motion from the rest of the guide.
+Limitations:
+- Fast or chaotic motion → fails.
+- Poor blending / artefacts in the first frame propagate everywhere.
+- Works best when the inserted subject roughly occupies the same region as
+  whatever it replaces.
+### Prompting
+Prompt is just as critical as in v1.0. **Describe both the object being
+replaced and the new one in detail**. Example: *"Replace the bronze statue on
+the left with a tall man wearing a navy raincoat and brown boots."* Vague
+prompts produce bad edits.
+### Which rank to use
+The same training produced both files. v1.1 is actually the merge of the
+two-stage training (one LoRA per stage), re-extracted at two different ranks
+via Frobenius-optimal truncated SVD:
+| File | Rank | Size | Frobenius retention |
+|---|---|---|---|
+| `edit_anything_30k_v1.1_motion_transfer_r128.safetensors` | 128 | 1.31 GB | ~99.4% |
+| `edit_anything_30k_v1.1_motion_transfer_r256.safetensors` | 256 | 2.62 GB | ~99.9% |
+r256 is closer to the merged source. r128 is normally indistinguishable in
+practice. Pick whichever fits your workflow.
+---
+## 2. Reference video-to-video (Ref V2V) — experimental
+Files (two builds of the same LoRA family — each ships as a `(.standard, .module)` pair):
+- `edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.standard.safetensors`
+- `edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding.module.safetensors`
+- `edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.standard.safetensors`
+- `edit_anything_reference_v0.1_r128_ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj.module.safetensors`
+### What it is
+The goal is **add / replace using a reference image** — same vibe as Edit
+Anything v1.0, but with an explicit image as the appearance source instead
+of relying only on the prompt.
+Trained on **~1600** Add / Replace video pairs. Reference-paired video
+datasets are basically nonexistent, so the dataset had to be built from
+scratch — that is why the sample count is small. **It often fails.** This
+is fully experimental; thousands of training runs went into landing on this
+LoRA layout, and it is still unclear how much it actually helps.
+### Architecture — why this LoRA has "modules"
+Trained as a conventional IC-LoRA, plus extra projection branches that try
+to make the reference signal survive across layers:
+- **`ref_visual_proj`** — projects the reference VAE latent into 32 visual
+  memory tokens.
+- **`ref_attn`** — a dedicated cross-attention branch inside each
+  transformer block, reading those tokens.
+- **`ref_adaln_proj`** — a global AdaLN bias derived from the reference
+  (palette / overall look).
+- **`role_embedding`** — an experimental token bias inspired by some of
+  Kijai's tests; whether it actually helps is still unclear.
+These extra weights are saved alongside the LoRA in a `.module.safetensors`
+sidecar because they are **not standard LoRA adapters** — the regular
+ComfyUI LoRA loader can't consume them, so they need a dedicated node.
+### How to load
+| File | What it is | Where it goes |
+|---|---|---|
+| `*.standard.safetensors` | LoRA on `attn1` / `attn2` / `ff` only | Standard ComfyUI LoRA loader |
+| `*.module.safetensors` | `role_embedding`, `ref_adaln_proj`, `ref_visual_proj`, `ref_attn` LoRA adapters | `LTXVEditAnythingModuleLoader` (BFSnodes) |
+Both files of a pair must be loaded **together** — the LoRA was trained
+against the sidecar adapters and they only make sense as a unit. Do not mix
+`.standard` from one build with `.module` from another.
+The module file is consumed by the **`🅛🅣🅧 LTXV Edit Anything Looping
+Sampler`** node, which was written specifically to:
+1. Install the `ref_attn` cross-attention branch on every transformer block.
+2. Inject the AdaLN / role / visual cross-attention conditioning at the
+   correct points in the model.
+3. Sample long videos in overlapping chunks with the conditioning re-applied
+   per chunk.
+### Which build to use
+- **`ref_adaln_proj-role_embedding`** — the original training. Only ships
+  the two side-channel modules.
+- **`ref_adaln_proj-role_embedding-ref_attn-ref_visual_proj`** — the
+  continuation. Adds the visual cross-attention branch and its projector on
+  top.
+It is genuinely **not clear yet** whether the extra branches help over the
+plain LoRA. Both builds are honest experiments. Try both, decide for your
+own use case, and please share findings.
+### Reading the layers
+For anyone who wants to understand what each layer in the Ref V2V
+checkpoint does:
+- [`lora_layers_reference.md`](./lora_layers_reference.md) — full tensor
+  inventory of both builds.
+- [`lora_layers_impact.md`](./lora_layers_impact.md) — what each branch
+  contributes at inference and which inference knob (`adaln_scale`,
+  `ref_context_scale`, `ref_token_scale`, `ref_start_block`,
+  `ref_end_block`, etc.) maps back to which training default.
+---
+## Prompt examples
+The two LoRAs were trained on very different caption styles. Match the
+style of whichever LoRA you're using — straying outside the training
+distribution is the fastest way to get garbage out.
+### Edit Anything v1.1 — standard editing
+The stage-1 dataset uses short imperative captions describing one or two
+edits. Use the same shape at inference. Examples drawn from the training
+distribution:
+- *"Replace the stone statue of a man on the left with a young woman in a
+  green dress."*
+- *"Add a black labrador retriever sitting beside the woman on the bench."*
+- *"Remove the teacher from the classroom."*
+- *"Alter the cap's colour from modern black to deep maroon."*
+- *"Replace the fresh citrus-green background with a wooden desk."*
+- *"Add faint tire tracks across the snow behind the car."*
+- *"Add a black statue, a blue camera, a cyan towel, a red guitar and a
+  pink backpack to the lakeside pier."*
+Tips:
+- Imperative verbs: **Add / Replace / Remove / Alter / Change**.
+- When replacing, **describe both** the original and the new subject so the
+  model can localise the edit.
+- Keep captions short and concrete. Long flowery prose hurts.
+### Edit Anything v1.1 — motion transfer
+Workflow:
+1. Pick a guide video.
+2. Edit **only the first frame** externally (Flux Kontext / "Klein", InstructPix2Pix, etc.)
+   to introduce the new subject in the desired pose and position.
+3. Feed the edited frame as the first frame of the input and the original
+   guide as motion source.
+4. The prompt should describe **the inserted subject and the action being
+   preserved**.
+Examples:
+- *"Replace the standing man holding the umbrella with a woman in a red
+  coat holding the same umbrella, walking across the puddles."*
+- *"Add a tabby cat curled up in the armchair while the man in the
+  background keeps reading."*
+- *"Replace the runner in the blue jersey with a man wearing a white shirt
+  and grey shorts running along the same path."*
+Limits: fast or chaotic motion will fail; the inserted subject should
+occupy roughly the same region/scale as what it replaces.
+### Reference V2V (Ref V2V) — Add and Replace
+These captions are real samples from the ~1600-pair training set. They
+describe the **target scene after the edit** in detail. The reference
+image carries the *appearance* of the inserted subject; the caption
+carries *position, pose, action, and surrounding context*.
+**Add task** (the reference image holds the new subject):
+- *"Add a middle-aged man with curly grey hair, a beard and glasses,
+  wearing a blue quarter-zip sweater, on the right side of the frame,
+  standing in front of a raw cut of meat on a tray."*
+- *"Add a light-coloured small boat with dark seats and an outboard motor
+  floating in the water."*
+- *"Add an open book filled with colourful pencils in the woman's hands."*
+- *"Add a silver metallic bucket on the table in front of the blonde
+  character, with her hands stirring a mixture inside."*
+- *"Add two miniature dolls, one blonde and one brunette, dressed in
+  patterned clothing, sitting at a small table with teacups and small
+  white vases on the countertop."*
+**Replace task** (the reference image holds the new subject; the caption
+also describes what is being replaced):
+- *"Replace the standing kangaroo holding the bicycle handlebars with a
+  man wearing a white t-shirt, light brown shorts and a yellow cap,
+  holding the bicycle handlebars."*
+- *"Replace the stone statue of a man on the left side with a young woman
+  in a green dress."*
+- *"Replace the wooden barrel near the entrance with a large brown leather
+  suitcase."*
+Tips for Ref V2V:
+- **Describe the inserted subject in full**, even though the reference
+  image is the source of truth — the text path drives placement and pose.
+- For *Replace*, **also describe what is being replaced** so the model can
+  match the spatial region.
+- Keep the inserted subject roughly in the same scale and region as what
+  it replaces.
+- The captions in the training set average ~25–40 words — aim for that
+  range. Single-sentence captions like *"Add a man"* are far too sparse
+  and will fail.
+---
+## ComfyUI nodes
+All recommended inference paths run through the **BFSnodes** custom node
+set. For now BFSnodes is the only place these nodes live; once they
+stabilise they may move elsewhere.
+Specific nodes used by these LoRAs:
+- `🅛🅣🅧 LTXV Edit Anything Looping Sampler` — sampler that injects role /
+  AdaLN / visual cross-attention and handles long videos in chunks.
+- `LTXVEditAnythingModuleLoader` — load the `*.module.safetensors` sidecar.
+---
+## Status
+Released as experimental research artefacts. Expect failures, do not
+deploy, and please report what works and what doesn't.
+---
+## Credits
+If you use these models — in a project, a demo, a paper, a video, a tweet,
+a workflow, anything — **please credit my work**. These checkpoints are the
+result of weeks of research, dataset building, and training runs, and that
+effort is what makes any of it usable. Crediting the source is the bare
+minimum that keeps open research like this sustainable.
+**Author:** Alisson Pereira dos Anjos ([@Alissonerdx](https://huggingface.co/Alissonerdx))
+Suggested attribution:
+> Edit Anything LoRAs by Alisson Pereira dos Anjos
+> ([huggingface.co/Alissonerdx/EditAnything](https://huggingface.co/Alissonerdx/EditAnything)).
+Links back to this repository are appreciated wherever you publish results.