Data

Vidprom extended files downloaded from

huggingface-cli download gdhe17/Self-Forcing vidprom_filtered_extended.txt --local-dir prompts

Lmdb files:

They are in Sharded mode, and the ODE pairs were created using a Guidance Scale = 6.0 from vidprom_filtered_extended_16k.txt

ODE training

Provided ODE trainingconfiguration file.
Best checkpoint, with lowest loss value was observed around 2500 steps and is provided here
Convergence behavior was observed.

ODE Generator Loss

ODE Generator Grad Norm

DMD training:

DMD videos show temporal flickering (WIP) and other artifacts. Tested DMD config is provided here It was inspired also by Long Live settings to use attention sink, and improved guidance scale to fine tune performance.

code for WAN2.1 1.3b is available here SiFRiA. used environment: 8-4 H200 GPUs

DMD Generator Loss

From a Game theoretic point of view this appears to be a good Nash equilibrium, but it was difficult to get the loss to go down. In all such cases, the model learned some inaccuracies, and was unable to recover from it.

DMD Grad Norm

The Norm also has a strange behavior , not sure about those double peaks, but the amplitude is very low, and the plateau appears stable.

Sample video

At the end of 300 iterations:

Prompt: "A Porsche, sleek and black, races forward swiftly along the asphalt. It weaves through the landscape against a backdrop of destroyed houses and skyscrapers cloaked in moss. As dawn breaks, the crimson sun ascends into the sky."