Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,68 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- lora
|
| 5 |
+
- video-generation
|
| 6 |
+
- wan
|
| 7 |
+
- wan-2.1
|
| 8 |
+
- wan-2.2
|
| 9 |
+
- training
|
| 10 |
+
- text-to-video
|
| 11 |
+
- image-to-video
|
| 12 |
+
- diffusion-transformer
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# Flimmer
|
| 16 |
+
|
| 17 |
+
Video LoRA training toolkit for diffusion transformer models. Built by [Alvdansen Labs](https://github.com/alvdansen).
|
| 18 |
+
|
| 19 |
+
Full pipeline from raw footage to trained LoRA checkpoint β scene detection, captioning,
|
| 20 |
+
dataset validation, latent pre-encoding, and training. Currently supports WAN 2.1 and
|
| 21 |
+
WAN 2.2 (T2V and I2V).
|
| 22 |
+
|
| 23 |
+
Early release. Building in the open.
|
| 24 |
+
|
| 25 |
+
## What it covers
|
| 26 |
+
|
| 27 |
+
- **Video ingestion** β scene detection, clip splitting, fps/resolution normalization
|
| 28 |
+
- **Captioning** β Gemini and Replicate backends
|
| 29 |
+
- **CLIP-based triage** β find clips matching a reference person or concept in large footage sets
|
| 30 |
+
- **Dataset validation** β catch missing captions, resolution mismatches, and format issues before spending GPU time
|
| 31 |
+
- **Latent pre-encoding** β VAE + T5 cached to disk so training doesn't repeat encoding every epoch
|
| 32 |
+
- **Training** β LoRA training with checkpoint resume, W&B logging, and in-training video sampling
|
| 33 |
+
|
| 34 |
+
## Phased training
|
| 35 |
+
|
| 36 |
+
The standout feature. Break a training run into sequential stages β each with its own
|
| 37 |
+
learning rate, epoch budget, and dataset β while the LoRA checkpoint carries forward
|
| 38 |
+
automatically between phases.
|
| 39 |
+
|
| 40 |
+
Use it for curriculum training (simple compositions before complex motion) or for
|
| 41 |
+
WAN 2.2's dual-expert MoE architecture, where the high-noise and low-noise experts
|
| 42 |
+
can be trained with specialized hyperparameters after a shared base phase.
|
| 43 |
+
MoE expert specialization is experimental β hyperparameters are still being validated.
|
| 44 |
+
|
| 45 |
+
## Standalone data tools
|
| 46 |
+
|
| 47 |
+
The data preparation tools output standard formats compatible with any trainer β
|
| 48 |
+
kohya, ai-toolkit, or anything else. You don't need to use Flimmer's training loop
|
| 49 |
+
to benefit from the captioning, triage, and validation tooling.
|
| 50 |
+
|
| 51 |
+
## Model support
|
| 52 |
+
|
| 53 |
+
| Model | T2V | I2V |
|
| 54 |
+
|---|---|---|
|
| 55 |
+
| WAN 2.1 | β
| β
|
|
| 56 |
+
| WAN 2.2 | β
| β
|
|
| 57 |
+
| LTX | π | π |
|
| 58 |
+
|
| 59 |
+
Image training is out of scope β ai-toolkit handles it thoroughly and there's no point
|
| 60 |
+
duplicating it. Flimmer is video-native.
|
| 61 |
+
|
| 62 |
+
## Installation & docs
|
| 63 |
+
|
| 64 |
+
Full installation instructions, config reference, and guides are on GitHub:
|
| 65 |
+
|
| 66 |
+
**[github.com/alvdansen/flimmer-trainer](https://github.com/alvdansen/flimmer-trainer)**
|
| 67 |
+
|
| 68 |
+
Supports RunPod and local GPU (tested on A6000/48GB).
|