alvdansen
/

flimmer-trainer

video-generation

diffusion-transformer

Model card Files Files and versions

flimmer-trainer / README.md

alvdansen's picture

Update README.md

1bc4fde verified 3 days ago

|

history blame contribute delete

2.45 kB

	---
	license: apache-2.0
	tags:
	- lora
	- video-generation
	- wan
	- wan-2.1
	- wan-2.2
	- training
	- text-to-video
	- image-to-video
	- diffusion-transformer
	---

	# Flimmer

	Video LoRA training toolkit for diffusion transformer models. Built by [Alvdansen Labs](https://github.com/alvdansen).

	Full pipeline from raw footage to trained LoRA checkpoint — scene detection, captioning,
	dataset validation, latent pre-encoding, and training. Currently supports WAN 2.1 and
	WAN 2.2 (T2V and I2V).

	Early release. Building in the open.

	## What it covers

	- Video ingestion — scene detection, clip splitting, fps/resolution normalization
	- Captioning — Gemini and Replicate backends
	- CLIP-based triage — find clips matching a reference person or concept in large footage sets
	- Dataset validation — catch missing captions, resolution mismatches, and format issues before spending GPU time
	- Latent pre-encoding — VAE + T5 cached to disk so training doesn't repeat encoding every epoch
	- Training — LoRA training with checkpoint resume, W&B logging, and in-training video sampling

	## Phased training

	The standout feature. Break a training run into sequential stages — each with its own
	learning rate, epoch budget, and dataset — while the LoRA checkpoint carries forward
	automatically between phases.

	Use it for curriculum training (simple compositions before complex motion) or for
	WAN 2.2's dual-expert MoE architecture, where the high-noise and low-noise experts
	can be trained with specialized hyperparameters after a shared base phase.
	MoE expert specialization is experimental — hyperparameters are still being validated.

	## Standalone data tools

	The data preparation tools output standard formats compatible with any trainer —
	kohya, ai-toolkit, or anything else. You don't need to use Flimmer's training loop
	to benefit from the captioning, triage, and validation tooling.

	## Model support

	\| Model \| T2V \| I2V \|
	\|---\|---\|---\|
	\| WAN 2.1 \| ✅ \| ✅ \|
	\| WAN 2.2 \| ✅ \| ✅ \|
	\| LTX \| 🔜 \| 🔜 \|

	Image training is out of scope — ai-toolkit handles it thoroughly and there's no point
	duplicating it. Flimmer is video-native.

	## Installation & docs

	Full installation instructions, config reference, and guides are on GitHub:

	[github.com/alvdansen/flimmer-trainer](https://github.com/alvdansen/flimmer-trainer)

	Supports RunPod and local GPU (tested on A6000/48GB).