---
license: apache-2.0
tags:
  - lora
  - video-generation
  - wan
  - wan-2.1
  - wan-2.2
  - training
  - text-to-video
  - image-to-video
  - diffusion-transformer
---

# Flimmer

Video LoRA training toolkit for diffusion transformer models. Built by [Alvdansen Labs](https://github.com/alvdansen).

Full pipeline from raw footage to trained LoRA checkpoint — scene detection, captioning, 
dataset validation, latent pre-encoding, and training. Currently supports WAN 2.1 and 
WAN 2.2 (T2V and I2V).

Early release. Building in the open.

## What it covers

- **Video ingestion** — scene detection, clip splitting, fps/resolution normalization
- **Captioning** — Gemini and Replicate backends
- **CLIP-based triage** — find clips matching a reference person or concept in large footage sets
- **Dataset validation** — catch missing captions, resolution mismatches, and format issues before spending GPU time
- **Latent pre-encoding** — VAE + T5 cached to disk so training doesn't repeat encoding every epoch
- **Training** — LoRA training with checkpoint resume, W&B logging, and in-training video sampling

## Phased training

The standout feature. Break a training run into sequential stages — each with its own 
learning rate, epoch budget, and dataset — while the LoRA checkpoint carries forward 
automatically between phases.

Use it for curriculum training (simple compositions before complex motion) or for 
WAN 2.2's dual-expert MoE architecture, where the high-noise and low-noise experts 
can be trained with specialized hyperparameters after a shared base phase. 
MoE expert specialization is experimental — hyperparameters are still being validated.

## Standalone data tools

The data preparation tools output standard formats compatible with any trainer — 
kohya, ai-toolkit, or anything else. You don't need to use Flimmer's training loop 
to benefit from the captioning, triage, and validation tooling.

## Model support

| Model | T2V | I2V |
|---|---|---|
| WAN 2.1 | ✅ | ✅ |
| WAN 2.2 | ✅ | ✅ |
| LTX | 🔜 | 🔜 |

Image training is out of scope — ai-toolkit handles it thoroughly and there's no point 
duplicating it. Flimmer is video-native.

## Installation & docs

Full installation instructions, config reference, and guides are on GitHub:

**[github.com/alvdansen/flimmer-trainer](https://github.com/alvdansen/flimmer-trainer)**

Supports RunPod and local GPU (tested on A6000/48GB).