| | --- |
| | license: apache-2.0 |
| | tags: |
| | - lora |
| | - video-generation |
| | - wan |
| | - wan-2.1 |
| | - wan-2.2 |
| | - training |
| | - text-to-video |
| | - image-to-video |
| | - diffusion-transformer |
| | --- |
| | |
| | # Flimmer |
| |
|
| | Video LoRA training toolkit for diffusion transformer models. Built by [Alvdansen Labs](https://github.com/alvdansen). |
| |
|
| | Full pipeline from raw footage to trained LoRA checkpoint β scene detection, captioning, |
| | dataset validation, latent pre-encoding, and training. Currently supports WAN 2.1 and |
| | WAN 2.2 (T2V and I2V). |
| |
|
| | Early release. Building in the open. |
| |
|
| | ## What it covers |
| |
|
| | - **Video ingestion** β scene detection, clip splitting, fps/resolution normalization |
| | - **Captioning** β Gemini and Replicate backends |
| | - **CLIP-based triage** β find clips matching a reference person or concept in large footage sets |
| | - **Dataset validation** β catch missing captions, resolution mismatches, and format issues before spending GPU time |
| | - **Latent pre-encoding** β VAE + T5 cached to disk so training doesn't repeat encoding every epoch |
| | - **Training** β LoRA training with checkpoint resume, W&B logging, and in-training video sampling |
| |
|
| | ## Phased training |
| |
|
| | The standout feature. Break a training run into sequential stages β each with its own |
| | learning rate, epoch budget, and dataset β while the LoRA checkpoint carries forward |
| | automatically between phases. |
| |
|
| | Use it for curriculum training (simple compositions before complex motion) or for |
| | WAN 2.2's dual-expert MoE architecture, where the high-noise and low-noise experts |
| | can be trained with specialized hyperparameters after a shared base phase. |
| | MoE expert specialization is experimental β hyperparameters are still being validated. |
| |
|
| | ## Standalone data tools |
| |
|
| | The data preparation tools output standard formats compatible with any trainer β |
| | kohya, ai-toolkit, or anything else. You don't need to use Flimmer's training loop |
| | to benefit from the captioning, triage, and validation tooling. |
| |
|
| | ## Model support |
| |
|
| | | Model | T2V | I2V | |
| | |---|---|---| |
| | | WAN 2.1 | β
| β
| |
| | | WAN 2.2 | β
| β
| |
| | | LTX | π | π | |
| |
|
| | Image training is out of scope β ai-toolkit handles it thoroughly and there's no point |
| | duplicating it. Flimmer is video-native. |
| |
|
| | ## Installation & docs |
| |
|
| | Full installation instructions, config reference, and guides are on GitHub: |
| |
|
| | **[github.com/alvdansen/flimmer-trainer](https://github.com/alvdansen/flimmer-trainer)** |
| |
|
| | Supports RunPod and local GPU (tested on A6000/48GB). |