5090_test / custom_code /README.md
yitongl's picture
Add DMD solver custom code
378f6fb verified

Standalone Source Map

Important files:

  • fastvideo/models/dits/wanvideo.py
    • Defines WanTransformerBlock_VSA.
    • Adds to_gate_compress.
    • Passes gate_compress into self-attention.
  • fastvideo/attention/backends/sparse_fp4_compress_attn.py
    • Backend name: SPARSE_FP4_COMPRESS_ATTN.
    • Sparse FP4 main branch plus high-precision block-mean compress branch.
    • Prints FASTVIDEO_BACKEND_CONFIRM: SPARSE_FP4_COMPRESS_ATTN is running.
  • fastvideo/attention/backends/sparse_fp4_attn.py
    • Base sparse FP4 attention without compress branch.
  • fastvideo/layers/nvfp4_fake_quant_linear.py
    • NVFP4FakeQuantReplicatedLinear.
    • Wan replacement helpers for normal fake-quant QAT and SVD-LoRA variants.
  • fastvideo/train/models/wan/wan.py
    • Training-time switches for enabling fake-quant linear and gate quantization.
  • fastvideo/pipelines/basic/wan/wan_dmd_pipeline.py
    • Deployment/validation pipeline used by this checkpoint.
    • It constructs FlowMatchEulerDiscreteScheduler from flow_shift.
  • fastvideo/pipelines/stages/denoising.py
    • Contains DmdDenoisingStage, the actual solver path for online validation.
    • Uses fixed DMD timesteps, predicts clean video, and re-noises to the next timestep instead of running a normal UniPC/Euler scheduler.step loop.
  • fastvideo/models/schedulers/scheduling_flow_match_euler_discrete.py
    • Flow-match scheduler implementation used for DMD conversion and re-noising.
  • fastvideo/platforms/interface.py, fastvideo/platforms/cuda.py
    • Registers SPARSE_FP4_ATTN and SPARSE_FP4_COMPRESS_ATTN.
  • fastvideo-kernel/python/fastvideo_kernel/...
    • Block-sparse attention and quantization kernel source used by the backend.

The full source snapshot in ../repo_source/ is preferred when running the included DCP export script. This directory is meant for quick inspection and porting into another inference stack.

Solver reproduction summary:

  • Use WanDMDPipeline, not the vanilla diffusers scheduler in model_index.json.
  • The copied scheduler/scheduler_config.json is base-model metadata (UniPCMultistepScheduler, solver_type=bh2) and is not the solver used by online validation.
  • Matching solver: DmdDenoisingStage with FlowMatchEulerDiscreteScheduler, shift=8.0, DMD timesteps [1000, 757, 522], num_inference_steps=3, guidance_scale=6.0, attention backend SPARSE_FP4_COMPRESS_ATTN, sparsity 0.8.