LTX-2.3 POP!

Prompt
POP! The girl in her blue stretching overalls inflates and explodes into a shower of confetti-like fabric scraps, leaving the surroundings covered in colorful remnants. There is nothing left of her popped body, the camera shows now empty background.
Prompt
POP! The girl inflates and explodes into a shower of confetti-like fabric scraps, leaving the surroundings covered in colorful remnants. There is nothing left of her popped body, the camera shows now empty background.
Prompt
POP! The both girls inflate and explode into a shower of confetti-like fabric scraps, leaving the surroundings covered in colorful remnants. There is nothing left of their popped bodies, the camera shows now empty background.
Prompt
POP! The girl inflates and explodes into a shower of confetti-like fabric scraps, leaving the surroundings covered in colorful remnants. There is nothing left of her popped body, the camera shows now empty background.
Prompt
A portrait of an anime girl with modest chest is shown. Then suddenly the anime girl pulls out a violin from her bag she is holding, accurately puts the bag back on the floor, places the violin on her shoulder, puts the fiddlestick on top of it and starts to move it back and forth with her hands, playing a melody. At the same time, air blows inside the girl and inflates her body like a balloon, her body expanding uniformly from head to toe. Her clothes strain at the seams, and her hair stands on end as she grows larger and larger. The girl continues to play melody on the violin like a bellows, pushing the fiddlestick, giggling mischievously. Finally, the girl reaches her maximum size – a gigantic, wobbly perfect sphere. She smiles in satisfaction. And POP! The girl explodes into a shower of confetti-like fabric scraps, leaving the surroundings covered in colorful remnants. There is nothing left of her popped body, the camera shows now empty background.
Prompt
A portrait of an anime girl with modest chest is shown. Then suddenly the anime girl pulls out a TNT-detonator looking like manual cube-shaped air pump from her pocket, places it on the floor, puts the pipe into her mouth and starts to press and press the long vertical piston with her hands. The pump blows air inside and inflates her body like a balloon, her body expanding uniformly from head to toe. Her clothes strain at the seams, and her hair stands on end as she grows larger and larger. The girl continues to blow into herself like a bellows, pushing the piston, giggling mischievously. Finally, the girl reaches her maximum size – a gigantic, wobbly perfect sphere. She smiles in satisfaction. And POP! The girl explodes into a shower of confetti-like fabric scraps, leaving the surroundings covered in colorful remnants. There is nothing left of her popped body, the camera shows now empty background.

Model description

With it you can now comically inflate and pop cartoon / anime characters into a cloud of cute confetti and colorful scraps.

See the workflow in videos metadata or in `workflow-pop.json`. (It's default, but with 30 steps, 0.6 distill lora and optional KJ nodes + VHS for resize and metadata saving)

Trigger words

Trigger phrase is:

POP! The girl inflates and explodes into a shower of confetti-like fabric scraps, leaving the surroundings covered in colorful remnants. There is nothing left of her popped body, the camera shows now empty background.

Download model

Download them in the Files & versions tab.

Technical description

This is my first LTX-2.3 LoRA, and my 8th LTX-2.+ LoRA overall.

It was trained in 7 hours on a 4090 in background mode employing musubi-tuner. As always, Dino/CREPA is on here. Conditioning on the first frame was heightened to ~0.3, because it's image2video-focused concept.

The dataset's composition was enhanced: I synthesized over 60 videos (multiple diverse inflations and popping per each subject) using Wan2.2 + my Eat Style cartoon LoRA (highly recommend for weirdness) at high CFG (8.0) and TCFG enabled for prompt following. Then I duplicated the non-special-action (without detonators, violins, etc.) videos and dropped out the captions except for what you see for the trigger words here. It both seems to have helped the model learn and to simplify the prompting for the end user. The train FPS for the dataset is 24. (Note, that the examples are at 50 FPS)

After combining with the dropout items, the final number of dataset videos is 112.

The CREPA and diffusion loss convergence were quite good and the LoRA is stable and generalizes to more complicated concepts like inflating two characters at once.

Here you can have the train config, I won't gatekeep it. The batch size is 2.

d=$(pwd)
cd /media/kabachuha/xiangliu/ltx-2.3-train/musubi-tuner/ && python ltx2_train_network.py \
  --mixed_precision bf16 \
  --dataset_manifest /media/kabachuha/xiangliu/ltx-2.3-train/datasets/POP-config_manifest.json \
  --ltx2_checkpoint /media/kabachuha/xiangliu/LTX2/ltx-2.3-22b-dev.safetensors \
  --ltx_version 2.3 \
  --ltx2_mode v \
  --lora_target_preset v2v \
  --blocks_to_swap 20 \
  --use_pinned_memory_for_block_swap \
  --gradient_checkpointing \
  --max_data_loader_n_workers 1 \
  --persistent_data_loader_workers \
  --learning_rate 8e-5 \
  --optimizer_type AdamW8bit \
  --lr_scheduler cosine \
  --lr_warmup_steps 10 \
  --flash_attn \
  --fp8_base \
  --fp8_scaled \
  --network_module networks.lora_ltx2 \
  --network_dim 64 \
  --network_alpha 64 \
  --loss_type huber --huber_delta 1.1 \
  --timestep_sampling shifted_logit_normal \
  --ltx2_first_frame_conditioning_p 0.34 \
  --sample_with_offloading \
  --sample_tiled_vae \
  --sample_vae_tile_size 512 \
  --sample_vae_temporal_tile_size 48 \
  --output_dir /media/kabachuha/xiangliu/ltx-2.3-train/outputs-lora/POP-lora-night/ \
  --output_name ltx23-POP-night \
  --log_config \
  --log_with tensorboard \
  --logging_dir /media/kabachuha/xiangliu/ltx-2.3-train/outputs-logs/ltx23-POP-night \
  --max_train_epochs 200 \
  --save_every_n_epochs 5 \
  --save_state \
  --crepa \
  --crepa_args mode=dino dino_model=dinov2_vitg14 student_block_idx=16 teacher_block_idx=32 lambda_crepa=0.002 tau=1.0 num_neighbors=2 schedule=constant warmup_steps=0 normalize=true
cd $d

Final words

If you liked this LoRA, please consider leaving a like. If you have any questions, please post them in the discussions tab.

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kabachuha/ltx23-pop

Adapter
(17)
this model

Collection including kabachuha/ltx23-pop