alvdansen
/

flimmer-trainer

video-generation

diffusion-transformer

Model card Files Files and versions

alvdansen commited on Mar 5

Commit

1bc4fde

·

verified ·

1 Parent(s): 5003c8c

Update README.md

Files changed (1) hide show

README.md +68 -3

README.md CHANGED Viewed

@@ -1,3 +1,68 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+tags:
+  - lora
+  - video-generation
+  - wan
+  - wan-2.1
+  - wan-2.2
+  - training
+  - text-to-video
+  - image-to-video
+  - diffusion-transformer
+---
+# Flimmer
+Video LoRA training toolkit for diffusion transformer models. Built by [Alvdansen Labs](https://github.com/alvdansen).
+Full pipeline from raw footage to trained LoRA checkpoint — scene detection, captioning,
+dataset validation, latent pre-encoding, and training. Currently supports WAN 2.1 and
+WAN 2.2 (T2V and I2V).
+Early release. Building in the open.
+## What it covers
+- **Video ingestion** — scene detection, clip splitting, fps/resolution normalization
+- **Captioning** — Gemini and Replicate backends
+- **CLIP-based triage** — find clips matching a reference person or concept in large footage sets
+- **Dataset validation** — catch missing captions, resolution mismatches, and format issues before spending GPU time
+- **Latent pre-encoding** — VAE + T5 cached to disk so training doesn't repeat encoding every epoch
+- **Training** — LoRA training with checkpoint resume, W&B logging, and in-training video sampling
+## Phased training
+The standout feature. Break a training run into sequential stages — each with its own
+learning rate, epoch budget, and dataset — while the LoRA checkpoint carries forward
+automatically between phases.
+Use it for curriculum training (simple compositions before complex motion) or for
+WAN 2.2's dual-expert MoE architecture, where the high-noise and low-noise experts
+can be trained with specialized hyperparameters after a shared base phase.
+MoE expert specialization is experimental — hyperparameters are still being validated.
+## Standalone data tools
+The data preparation tools output standard formats compatible with any trainer —
+kohya, ai-toolkit, or anything else. You don't need to use Flimmer's training loop
+to benefit from the captioning, triage, and validation tooling.
+## Model support
+| Model | T2V | I2V |
+|---|---|---|
+| WAN 2.1 | ✅ | ✅ |
+| WAN 2.2 | ✅ | ✅ |
+| LTX | 🔜 | 🔜 |
+Image training is out of scope — ai-toolkit handles it thoroughly and there's no point
+duplicating it. Flimmer is video-native.
+## Installation & docs
+Full installation instructions, config reference, and guides are on GitHub:
+**[github.com/alvdansen/flimmer-trainer](https://github.com/alvdansen/flimmer-trainer)**
+Supports RunPod and local GPU (tested on A6000/48GB).