Data
Vidprom extended files downloaded from
huggingface-cli download gdhe17/Self-Forcing vidprom_filtered_extended.txt --local-dir prompts
Lmdb files:
They are in Sharded mode, and the ODE pairs were created using a Guidance Scale = 6.0 from vidprom_filtered_extended_16k.txt
ODE training
- Provided ODE trainingconfiguration file.
- Best checkpoint, with lowest loss value was observed around 2500 steps and is provided here
- Convergence behavior was observed.
ODE Generator Loss
ODE Generator Grad Norm
DMD training:
DMD videos show temporal flickering (WIP) and other artifacts. Tested DMD config is provided here It was inspired also by Long Live settings to use attention sink, and improved guidance scale to fine tune performance.
code for WAN2.1 1.3b is available here SiFRiA. used environment: 8-4 H200 GPUs
DMD Generator Loss
From a Game theoretic point of view this appears to be a good Nash equilibrium, but it was difficult to get the loss to go down. In all such cases, the model learned some inaccuracies, and was unable to recover from it.
DMD Grad Norm
The Norm also has a strange behavior , not sure about those double peaks, but the amplitude is very low, and the plateau appears stable.
Sample video
At the end of 300 iterations:
Prompt: "A Porsche, sleek and black, races forward swiftly along the asphalt. It weaves through the landscape against a backdrop of destroyed houses and skyscrapers cloaked in moss. As dawn breaks, the crimson sun ascends into the sky."



