--- license: apache-2.0 library_name: diffusers base_model: Lightricks/LTX-2.3 tags: - lora - video - video-editing - ltx-2.3 --- # Edit Anything — Experimental LTX-2 Video Editing LoRAs > **Heads up.** These LoRAs are research experiments. They are far from > production-ready and will fail on many inputs. They are released for the > community to play with and break, not as a finished tool. This repository hosts three unrelated training tracks built on top of **LTX-2.3 (22B)** for video editing: 1. **Edit Anything v0.1 — motion transfer LoRA** (two ranks). 2. **Edit Anything — no-reference multitask LoRA** (rank 256, prompt-driven only). 3. **Reference video-to-video (Ref V2V) — experimental IC-LoRA + sidecar modules** (two builds). Inference is meant to run through the **BFSnodes** ComfyUI custom nodes — the Ref V2V build in particular needs them to load the sidecar modules and install the custom branches into the transformer. --- ## 1. Edit Anything v0.1 (motion transfer) Files: - `edit_anything_30k_v0.1_motion_transfer_r128.safetensors` - `edit_anything_30k_v0.1_motion_transfer_r256.safetensors` ### What it is **v0.1 is not a direct continuation of v1.0.** It was trained from scratch in two stages: 1. **Stage 1 — image-only pretraining.** ~30 000 image edit pairs. Training a *video* model on still images is admittedly not ideal, but it was a way to push the editing vocabulary beyond what a small video-only dataset can teach. 2. **Stage 2 — video fine-tune with `first_frame_conditioning > 0`.** This restored the temporal prior and unlocked the motion-transfer behaviour described below. In theory v0.1 can do the same edits as v1.0, but **temporal consistency may be weaker than v1.0** because so much of stage 1 happened on still images. Test against v1.0 case-by-case before assuming v0.1 wins on your task. ### Motion transfer Because stage 2 included first-frame conditioning, you can drive the LoRA into a motion-transfer mode: 1. Take a guide video. 2. **Replace its first frame** with an edited still (insert a new subject, swap an object, etc.). Use a strong image-editing model — **Flux Klein** or similar — to prepare it; the quality of this single frame propagates through the whole clip. 3. Feed the edited frame as the first frame of the input, and the original guide video as the motion source. The model uses the new first frame as the appearance anchor and copies the motion from the rest of the guide. Limitations (these are real, not theoretical — expect them to bite): - **Hard scene cuts break it.** The model assumes continuous motion from the first frame onwards. A cut to a different camera angle or location mid-clip will produce smearing, ghosting, or the inserted subject jumping to the wrong position. Use clips without cuts, or split at the cuts and process each segment separately. - **Very fast motion fails.** Quick pans, fast subject movement, or high-velocity action confuse the motion-copy mechanism. Outputs degrade to blur or to the model "freezing" on the first-frame appearance and losing the motion entirely. Stick to moderate-speed clips. - Poor blending / artefacts in the first frame propagate everywhere. - Works best when the inserted subject roughly occupies the same region as whatever it replaces. ### Prompting Prompt is just as critical as in v1.0. **Describe both the object being replaced and the new one in detail**. Example: *"Replace the bronze statue on the left with a tall man wearing a navy raincoat and brown boots."* Vague prompts produce bad edits. ### Which rank to use The same training produced both files. v0.1 is actually the merge of the two-stage training (one LoRA per stage), re-extracted at two different ranks via Frobenius-optimal truncated SVD: | File | Rank | Size | Frobenius retention | |---|---|---|---| | `edit_anything_30k_v0.1_motion_transfer_r128.safetensors` | 128 | 1.31 GB | ~99.4% | | `edit_anything_30k_v0.1_motion_transfer_r256.safetensors` | 256 | 2.62 GB | ~99.9% | r256 is closer to the merged source. r128 is normally indistinguishable in practice. Pick whichever fits your workflow. ### How to wire the LoopingSampler This is a **standard LoRA**, not a sidecar. Load it through the regular ComfyUI LoraLoader **before** the LoopingSampler. On the sampler itself: - `editanything_module` → **leave disconnected**. - `ref_image` → the edited first frame (for motion transfer) **or** the source frame you want preserved (for plain editing). - `guide_frames` → the guide video. - `enable_role_embedding`, `enable_adaln`, `enable_visual_crossattn` → all **off**. None of those branches were trained for v0.1; turning them on with no module connected does nothing anyway, but keeping them off silences the WARN logs. --- ## 2. Edit Anything — no-reference multitask LoRA File: - `edit_anything_v1.1_r256.safetensors` ### What it is A **prompt-only** multitask editing LoRA. No reference image, no first-frame conditioning — the model is driven entirely by the text prompt and the guide video. Trained on a balanced mix of **Add, Remove, Replace, Style** edits. ### What it's different about it (vs v0.1) The task vocabulary overlaps heavily with v0.1 — both can do Add, Remove, Replace, Change, Convert. What changes here: - **Two-stage training continuation**: the first stage gave the model its edit vocabulary; the second stage refined it on a larger, more balanced video pair set covering Add / Remove / Replace / Style. - **Rank 256** (vs v0.1's effective rank from the merge), giving more capacity for the broader task mix. - Trained directly on video pairs, so the temporal behaviour on these tasks tends to be steadier than on a model whose first stage was on still images. ### How to use it **Standalone** — load it as a regular LoRA on vanilla LTX-2.3 through any ComfyUI LoRA loader. The file already carries everything it needs; no stacking with v0.1, no companion module. ### Limitations - No reference image → identity is not anchored, so Add / Replace of a specific person or object will be wobblier than the Ref V2V build. - No motion transfer (that's v0.1 only). ### Prompting Same imperative shape as v0.1, but the training set split into four very distinct caption styles. Match the one that fits the edit you want — the distribution is narrow and the model expects the right shape. The training set is roughly balanced across **Add, Remove, Replace and Style** buckets, with Style being the smallest of the four. Captions below are real examples drawn from those buckets. #### Add — 15 to 30+ words, describe what to add and where * `Add a smiling woman with brown hair, wearing a pink sleeveless top, sitting to the right of the man at the news desk.` * `Add a person wearing a blue denim shirt over a white t-shirt to the right side of the frame, behind the person cooking.` * `Add a decorated Christmas tree with red and white ornaments and lights to the right of the man.` * `Add a blonde boy wearing a black t-shirt with a blue collar and blue patterned pants, sitting behind the other children in the upper center of the frame.` * `Add two horizontal wooden strips to the front of the white range hood.` Pattern: `Add , , .` #### Remove — very short, 4 to 10 words * `Remove the man drinking from a glass.` * `Remove the disco ball.` * `Remove the large tree on the right.` * `Remove the squirrel in the foreground.` * `Remove the man on the left.` Pattern: `Remove the ` (+ optional position). Resist the urge to over-describe — long Remove prompts drift outside the training shape and often fail. #### Replace — 20 to 35 words, describe both old and new * `Replace the white panel door on the right side of the frame with a dark brown grandfather clock.` * `Replace the light-colored cat lying on the mat on the floor with a young woman sitting on the mat.` * `Replace the dark grey knitted sweater on the man's torso with a black and white patterned Christmas sweater.` * `Replace the blue robot with a glowing blue face on the left with a smiling man wearing sunglasses and a blue shirt.` * `Replace the sitting person wearing a black cape on the left with a black fabric draped over an object.` Pattern: `Replace with .` #### Style — fixed template, the style name is what changes * `Convert the video into a Pencil Sketch style.` * `Convert the video into a Watercolor Painting style.` * `Convert the video into a Van Gogh style.` * `Convert the video into a Play-Doh style.` * `Convert the video into a Claymation style.` * `Convert the video into a 3D Chibi style.` * `Convert the video into a Ghibli style.` * `Convert the video into a Pop Art style.` * `Convert the video into an American Cartoon style.` * `Convert the video into a Flat Vector Cartoon style.` The training set covers **300+ distinct style names**. Many work; many do not. The list above is heavily represented in training. Use the exact phrase `Convert the video into a