self_forcing_trial / Readme.md
ik6626's picture
Update Readme.md
eb95331 verified
# Data
Vidprom extended files downloaded from
huggingface-cli download gdhe17/Self-Forcing vidprom_filtered_extended.txt --local-dir prompts
# Lmdb files:
They are in Sharded mode, and the ODE pairs were created using a Guidance Scale = 6.0 from vidprom_filtered_extended_16k.txt
# ODE training
* Provided [ODE trainingconfiguration file](https://huggingface.co/ik6626/self_forcing_trial/blob/main/ode_config.yaml).
* Best checkpoint, with lowest loss value was observed around 2500 steps and is provided [here](https://huggingface.co/ik6626/self_forcing_trial/blob/main/expt_2_1_1_3b_ode/checkpoint_model_002500/model.pt)
* Convergence behavior was observed.
## ODE Generator Loss
![image](https://cdn-uploads.huggingface.co/production/uploads/68ff432cbdde156e4c4c75b3/x3HV2iZAG5Y29WOBeECja.png)
## ODE Generator Grad Norm
![image](https://cdn-uploads.huggingface.co/production/uploads/68ff432cbdde156e4c4c75b3/XeKEBh2qE9vS8jEYm13j1.png)
# DMD training:
DMD videos show temporal flickering (WIP) and other artifacts. Tested DMD config is provided [here](https://huggingface.co/ik6626/self_forcing_trial/blob/main/dmd_config.yaml)
It was inspired also by Long Live settings to use attention sink, and improved guidance scale to fine tune performance.
code for WAN2.1 1.3b is available here [SiFRiA](https://github.com/moonmath-ai/SiFRiA/tree/wan2.1_1.3bAR).
used environment: 8-4 H200 GPUs
## DMD Generator Loss
From a Game theoretic point of view this appears to be a good Nash equilibrium, but it was difficult to get the loss to go down.
In all such cases, the model learned some inaccuracies, and was unable to recover from it.
![image](https://cdn-uploads.huggingface.co/production/uploads/68ff432cbdde156e4c4c75b3/qryj4amF9XYw_MzXOTHcq.png)
## DMD Grad Norm
The Norm also has a strange behavior , not sure about those double peaks, but the amplitude is very low, and the plateau appears stable.
![image](https://cdn-uploads.huggingface.co/production/uploads/68ff432cbdde156e4c4c75b3/hc1FhJAsc2X8uSpEGhabt.png)
## Sample video
At the end of 300 iterations:
Prompt: "A Porsche, sleek and black, races forward swiftly along the asphalt. It weaves through the landscape against a backdrop
of destroyed houses and skyscrapers cloaked in moss. As dawn breaks, the crimson sun ascends into the sky."
<video src="https://huggingface.co/ik6626/self_forcing_trial/resolve/main/iter_000300.mp4" controls width="100%"></video>