SyFeee

docs: correct IC-LoRA mislabeling — empirical A/B/C test shows mechanism is first-frame i2v pin, not parallel-canvas IC-LoRA (credit ZKong)

23ed3c5 verified 9 days ago

preview code

raw

history blame contribute delete

12 kB

	---
	license: apache-2.0
	base_model:
	- Lightricks/LTX-2.3
	tags:
	- video-generation
	- lora
	- ltx-video
	- dual-character
	- dialogue
	- cinematic
	- chinese-drama
	- image-to-video
	pipeline_tag: image-to-video
	language:
	- en
	- zh
	---

	# LTX-Video 2.3 — Dual-Character LoRA (English mirror)

	A field-tested image-to-video character-consistency LoRA for `Lightricks/LTX-2.3` (22B distilled), tuned for two-character dialogue scenes and multi-shot cinematic video generation.

	> ⚠️ Naming note (corrected 2026-05-21):
	> The original filename and ModelScope repo include the string "IC-LORA", but this is NOT an IC-LoRA in the strict technical sense (parallel-canvas / `video_conditioning` mechanism). An A/B/C test (same prompt + seed, three reference-channel variants) confirmed that the LoRA's actual conditioning mechanism is first-frame pixel pinning (the regular i2v path), not parallel-canvas attention. Earlier copy on this card incorrectly described it as IC-LoRA — that has been removed. Credit to ZKong for raising the discrepancy in the discussions tab.

	---

	## Example renders

	Episode is an 8-shot Chinese palace drama (《玉佩定情》 + 《暗夜阴谋》) with three characters: 沈月华 (Shen Yuehua, heroine), 萧云霄 (Xiao Yunxiao, prince), 慕容静 (Murong Jing, antagonist). Render config: 1280×704, 121 frames @ 24 fps, ambient audio.

	### Single-character identity — Shen Yuehua walking in the garden, picks up a jade pendant
	<video controls autoplay muted loop src="https://huggingface.co/SyFeee/LTX2.3-Dual-Character-en/resolve/main/examples/E1S1_garden_walk_single_character.mp4"></video>

	### Dual-character dialogue — Shen + Xiao meet (the LoRA's signature use case)
	<video controls autoplay muted loop src="https://huggingface.co/SyFeee/LTX2.3-Dual-Character-en/resolve/main/examples/E1S2_prince_meets_dual_character.mp4"></video>

	### Cross-scene identity — Murong Jing in a different location (palace night chamber)
	<video controls autoplay muted loop src="https://huggingface.co/SyFeee/LTX2.3-Dual-Character-en/resolve/main/examples/E2S1_murong_plots_cross_scene.mp4"></video>

	### Three-character composition — the LoRA's upper limit
	<video controls autoplay muted loop src="https://huggingface.co/SyFeee/LTX2.3-Dual-Character-en/resolve/main/examples/E2S4_three_character_confrontation.mp4"></video>

	---

	## What this LoRA does

	Fine-tuned on `Lightricks/LTX-2.3` (22B distilled), specifically for:

	1. Two-character dialogue scenes — significantly reduces character drift when two people appear in the same frame
	2. Cinematic shot composition — reinforced for dialogue-driven framing (close-up ↔ medium ↔ wide)
	3. Multi-shot narrative continuity — better understanding of multi-segment prompts (storyboard-style descriptions)
	4. Style compatibility — works well across 古风仙侠 (ancient Chinese fantasy), 现代都市 (modern urban), and 3D 动漫 styles

	The reference image is consumed via first-frame pixel pin (standard i2v conditioning), not via the parallel-canvas / `video_conditioning` channel.

	---

	## How to use (correct pattern)

	### Single-character shot

	```python
	# Upstream LTX-2.3 distilled pipeline — single reference as first-frame pin
	from ltx_pipelines.distilled import DistilledPipeline
	from ltx_pipelines.utils.args import ImageConditioningInput
	from ltx_core.loader import LoraPathStrengthAndSDOps, sd_ops as _sd_ops_mod
	import torch

	lora = LoraPathStrengthAndSDOps(
	"LTX2.3-IC-LORA-Dual-Character.safetensors",
	0.8, # strength (standalone)
	_sd_ops_mod.LTXV_LORA_COMFY_RENAMING_MAP,
	)

	pipe = DistilledPipeline(
	distilled_checkpoint_path="ltx-2.3-22b-distilled-1.1.safetensors",
	spatial_upsampler_path="ltx-2.3-spatial-upscaler-x2-1.1.safetensors",
	gemma_root="google/gemma-3-12b-it-qat-q4_0-unquantized",
	loras=[lora],
	device=torch.device("cuda:0"),
	)

	video, audio = pipe(
	prompt="...",
	seed=42,
	height=704, width=1280,
	num_frames=121, # 5 s @ 24 fps, satisfies 8k+1
	frame_rate=24,
	images=[ImageConditioningInput( # first-frame pin = THE reference mechanism
	path="character_ref.png",
	frame_idx=0,
	strength=0.9,
	)],
	enhance_prompt=False,
	)
	```

	### Dual-character shot

	LTX's i2v pin rejects two pins at the same `frame_idx`, so two refs can't both be pinned at frame 0. Two workable patterns:

	Pattern A (recommended): composite reference image. Build one image with character A on the left and character B on the right (e.g., via PIL `Image.paste` or any image editor), pin THAT at `frame_idx=0`. Both identities transfer in one pin.

	Pattern B: stagger the pins. Pin character A at frame 0, character B at a later latent boundary (e.g., frame 64 — must be a multiple of 8 per the VAE's temporal compression). Only works if B doesn't need to be visible from the very first frame.

	### Recommended parameters

	\| Setting \| Value \|
	\|---\|---\|
	\| Resolution \| 1280 × 704 (16:9, native LTX-2.3 distilled training resolution) \|
	\| Faster preview \| 960 × 544 (~40% faster, slightly less detail) \|
	\| Frames \| satisfy 8k+1 — e.g. 121 (5 s), 193 (8 s), 241 (10 s), 361 (15 s) at 24 fps \|
	\| Strength \| Standalone 0.7-0.9 · stacked with style LoRAs 0.3-0.5 \|
	\| Pin strength \| 0.85-0.95 for tight identity, 0.7 for looser "inspired-by" \|
	\| Trigger word \| None \|

	---

	## Field-tested production tips

	Quirks of this LoRA + the LTX-2.3 distilled backbone that aren't in the original card but matter in practice.

	### 1. Repeat color tokens for dark-clothed characters

	This LoRA has a light-wuxia-robe bias. Dark outfits drift toward white at low pin strength. Repeat the color token glued to each clothing noun:

	```text
	BAD: black fedora and black suit
	GOOD: BLACK fedora, white shirt, BLACK suit jacket, BLACK trousers,
	... BLACK suit, BLACK trousers throughout
	```

	Also bump pin strength to ~0.95 for color fidelity on dark outfits.

	### 2. Never use quoted dialogue in prompts

	This LoRA was trained on Chinese drama clips with burned-in Chinese subtitles. Any quoted dialogue (`「…」` or `"…"`) in the prompt causes the LoRA to hallucinate subtitle characters at the bottom of the frame. Single biggest gotcha.

	```text
	BAD: 低声警告「此茶不可饮！」 ← fake on-screen subtitles
	GOOD: 低声急切警告她茶水有毒 ← clean output, indirect narration
	```

	If your app needs subtitles, burn them post-hoc via `ffmpeg drawtext`.

	### 3. Avoid "object detaches" prompts during action

	At high motion intensity, the model loses object tracking. A directive like "fedora flies off mid-spin and tumbles to the floor" produces broken output — the hat dematerialises. Either:
	- Keep the object attached and say so explicitly ("the fedora STAYS ON his head throughout the spin")
	- Or render attach + detach as two clips and concat

	### 4. Cross-shot identity drift

	For multi-shot dialogue scenes, character identity drifts across cuts. Workaround: re-pin the reference image at frame 0 of every shot. (Deterministic seed + same first-frame pin + same prompt scaffolding produces good repeatability.)

	### Render performance

	- Resolution: 1280 × 704, 121 frames @ 24 fps (~5 s output)
	- Hardware: NVIDIA A800 80 GB → ~70 s per shot
	- Output: mp4 with ambient audio track (no TTS)

	On consumer hardware (RTX 4090 24 GB), expect ~3-4 minutes per shot.

	---

	## Limitations

	1. Subtitle hallucination with quoted dialogue (see tip #2)
	2. Complex physical interactions (wrestling, hugging, intricate hand-on-hand) can deform
	3. Tail-frame artifact of LTX-2.3 — last 6-8 frames may smear; trim post-hoc if needed
	4. Action complexity ceiling — the 8-step distilled budget caps motion complexity at action peaks
	5. Portrait orientation degrades identity (LoRA trained on landscape only)
	6. Dual-character via two separate refs is awkward (see "How to use" above) — composite-image pin is the cleanest workaround

	---

	## Original Chinese README (preserved)

	The original Chinese model card from ModelScope is reproduced below for users who want the unmodified original documentation. (Note: the original card uses the "IC-LoRA" label — the term has been kept here for fidelity, even though the A/B/C test described above shows the conditioning mechanism is first-frame i2v pinning rather than parallel-canvas IC-LoRA.)

	<details>
	<summary>点击展开原版中文模型卡片 (click to expand original Chinese README)</summary>

	### LTX-Video (2.3) IC-LoRA: 双人分镜头对话增强模型

	本模型是基于 Lightricks LTX-2.3 底模训练的 IC-LoRA，专为双人同框对话、角色互动及分镜头视频生成场景深度优化。

	一、模型核心提升

	1. 角色参考稳定性：显著提升双人同框时的人物特征一致性，减少角色漂移。
	2. 分镜构图稳定性：针对影视化对话构图进行了加固，支持更精准的镜头控制。
	3. 叙事连贯性：增强了对多段描述的理解力，使分镜间的过渡衔接更自然。
	4. 风格兼容性：完美支持古风仙侠、现代都市、3D 动漫等主流视觉风格。

	二、模型基本信息

	1. 基础模型：Lightricks/LTX-2.3
	2. 许可证：Apache-2.0
	3. 管道标签：image-to-video, text-to-video
	4. 模型用途：仅供学习交流使用
	5. 开发者：麻雀 AI

	三、运行指南

	1. 推荐平台：ComfyUI
	2. 支持工作流：ComfyUI 官方 LTX 工作流、KJ-LTX 插件工作流
	3. 生成模式：文生视频 (T2V) 与图生视频 (I2V) 均支持
	4. 硬件参考：RTX 5090 显卡在 720P 分辨率下，单条视频生成耗时约 2 分钟

	四、推荐参数配置

	1. 分辨率：建议使用 16:9 (如 1280x720)
	2. 时长与帧率：建议时长 ≥10 秒，帧率设定为 24 FPS
	3. LoRA 权重设定：
	- 独立使用建议：0.6 - 1.0
	- 叠加其他 LoRA 使用时建议：0.3 - 0.5

	五、Prompt 编写规范

	1. 编写逻辑：需包含完整的场景描述 + 角色设定 + 分镜设计 + 镜头语言，强化双人对话互动逻辑。
	2. 触发词说明：无需特定触发词。

	六、效果说明与局限性

	1. 优势风格：在古风、现代、3D 动漫类双人对话场景中表现最佳。
	2. 已知限制：受限于 LTX-2.3 底模性能，极其复杂的双人肢体互动（如缠绕、打斗）可能出现形变。
	3. 运动幅度：建议以对话和微动作为主，大动态动作的连贯性仍有提升空间。

	</details>

	---

	## Hardware requirements

	\| GPU \| VRAM \| Works? \|
	\|---\|---\|---\|
	\| A100 / A800 80 GB \| 80 GB \| ✅ ~70 s per 5 s shot \|
	\| RTX 4090 / 3090 \| 24 GB \| ✅ ~3-4 min per 5 s shot \|
	\| RTX 4080 / 4070 Ti Super \| 16 GB \| ❌ won't fit 22B in bf16 \|
	\| anything < 24 GB \| — \| ❌ no \|

	---

	## Acknowledgements

	- 麻雀 AI (Maque AI) — original author of this LoRA, [original ModelScope repository](https://www.modelscope.cn/models/fxj1131/LTX2.3-IC-LORA-Dual-Character)
	- [Lightricks](https://www.lightricks.com/) — for the LTX-Video 2.3 base model
	- ZKong — for catching the IC-LoRA labeling discrepancy in the discussion thread; the empirical A/B/C test ran in response settled it

	---

	## Source attribution

	> This is an English-language mirror of [fxj1131's LTX2.3 Dual-Character LoRA on ModelScope](https://www.modelscope.cn/models/fxj1131/LTX2.3-IC-LORA-Dual-Character).
	> All credit for the model weights belongs to the original author, 麻雀 AI (Maque AI).
	> This mirror exists to make the model + documentation accessible to HuggingFace users who cannot easily access ModelScope, and to share field-tested usage notes from a production deployment.
	> The `.safetensors` weights file is unmodified and byte-identical to the ModelScope upload.

	---

	## License

	Apache License 2.0 — same as the original. See `LICENSE` and `NOTICE`.