Post
62
We surveyed six long-video generation approaches and shipped one. Goal: ≥15s coherent video on a single GPU, under a minute wallclock. Wan2.2 is solid at 3–5s; 10s+ is where it gets interesting.
![Cat Adventure clip 1+3, 15s SVI on TurboWan, 33s single GPU]
Survey: TTT, LoL, Self Forcing, Self Forcing++, Infinite Talk, Helios. Each had a wall — training cost, static-only demos, VRAM saturation at 10s, no released weights, narrow A2V lane, full 14B retrain.
Three buckets fell out: extend attention (Type A, hits VRAM), compress history (Type B, costs retrain), stateful rolling (Type C, LoRA-only). We shipped Type C — SVI (Stable Video Infinity) on TurboWan.
Each 5s clip is conditioned on a global identity anchor + motion bridge from prior clip. The trick: train the LoRA on its own errors so it learns noisy historical context. Production: 15s output in 33s on a single GPU, 64% pass rate on a 14-case test set.
Full breakdown with attention diagrams, VRAM math, per-route table:
https://www.reddit.com/r/AtlasCloudAI/comments/1t64dy9/
![Cat Adventure clip 1+3, 15s SVI on TurboWan, 33s single GPU]
Survey: TTT, LoL, Self Forcing, Self Forcing++, Infinite Talk, Helios. Each had a wall — training cost, static-only demos, VRAM saturation at 10s, no released weights, narrow A2V lane, full 14B retrain.
Three buckets fell out: extend attention (Type A, hits VRAM), compress history (Type B, costs retrain), stateful rolling (Type C, LoRA-only). We shipped Type C — SVI (Stable Video Infinity) on TurboWan.
Each 5s clip is conditioned on a global identity anchor + motion bridge from prior clip. The trick: train the LoRA on its own errors so it learns noisy historical context. Production: 15s output in 33s on a single GPU, 64% pass rate on a 14-case test set.
Full breakdown with attention diagrams, VRAM math, per-route table:
https://www.reddit.com/r/AtlasCloudAI/comments/1t64dy9/