YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

FastVideo 5090 Test Checkpoint

This repository is an HF/diffusers-style export of the FastVideo DMD2 student checkpoint at step 2950.

The model is packaged as safetensors for easier loading. It is not a vanilla diffusers Wan checkpoint: the transformer uses a custom Wan block with to_gate_compress, SPARSE_FP4_COMPRESS_ATTN, and NVFP4 fake-quant linear wrappers. Use the matching FastVideo code or the included custom_code/ files.

Files

  • model_index.json, scheduler/, text_encoder/, tokenizer/, vae/: copied from FastWan2.1-T2V-1.3B-Diffusers.
  • transformer/diffusion_pytorch_model.safetensors: exported student transformer weights from checkpoint-2950.
  • custom_code/: standalone source for the attention backend, gate model structure, NVFP4 fake quant linear, and required registry points.
  • training_metadata/: original checkpoint metadata and launch scripts.

Training/Export Setting

  • Source checkpoint: dmd2_fastwan_student_sfp4_compress_s08_nvfp4_qat_gatefp4_linear_criticwarm0_32g_batch_short_array/checkpoint-2950
  • Student init: models/FastWan2.1-T2V-1.3B-Diffusers
  • Student attention: SPARSE_FP4_COMPRESS_ATTN
  • Sparsity: 0.8
  • Student linear: NVFP4 fake-quant QAT
  • Gate-compress projection: enabled and NVFP4 fake-quant QAT
  • LoRA: disabled
  • Teacher/critic: dense bf16 linear with FLASH_ATTN config
  • DMD2 timesteps: [1000, 757, 522]
  • Validation setting: 3-step DMD, guidance scale 6.0, flow shift 8

Solver For Reproduction

Do not reproduce this checkpoint with the vanilla scheduler/ entry in model_index.json. That copied base-model scheduler is UniPCMultistepScheduler with solver_type=bh2, but it is not the online validation solver used for this checkpoint.

The matching deployment path is:

  • pipeline class: fastvideo.pipelines.basic.wan.wan_dmd_pipeline.WanDMDPipeline
  • denoising stage: fastvideo.pipelines.stages.denoising.DmdDenoisingStage
  • scheduler class: FlowMatchEulerDiscreteScheduler
  • solver mode: DMD few-step pred-video + re-noise loop
  • scheduler shift: 8.0
  • pipeline flow_shift: 8
  • DMD timesteps: [1000, 757, 522]
  • number of denoising steps: 3
  • guidance scale: 6.0
  • target dtype inside DMD denoising: bfloat16
  • attention backend: SPARSE_FP4_COMPRESS_ATTN
  • attention sparsity: 0.8
  • high precision O backward flag used in training/eval: FASTVIDEO_SPARSE_FP4_USE_HIGH_PREC_O=1

The DMD step does not call scheduler.step(...) like a normal Euler/UniPC diffusion loop. For each fixed timestep it predicts noise, converts it to predicted clean video with pred_noise_to_pred_video(...), and, except for the last step, re-noises that predicted video to the next fixed timestep with FlowMatchEulerDiscreteScheduler.add_noise(...).

The exact deployment config is in training_metadata/deployment_repro_config.json. The relevant solver source is also included under custom_code/fastvideo/pipelines/basic/wan/wan_dmd_pipeline.py, custom_code/fastvideo/pipelines/stages/denoising.py, and custom_code/fastvideo/models/schedulers/scheduling_flow_match_euler_discrete.py.

Custom Architecture

The Wan self-attention block adds:

self.to_gate_compress = ReplicatedLinear(dim, dim, bias=True, ...)

Forward computes the gate from the same normalized hidden states as Q/K/V:

gate_compress, _ = self.to_gate_compress(norm_hidden_states)
gate_compress = gate_compress.squeeze(1).unflatten(2, (num_heads, -1))
attn_output, _ = self.attn1(..., gate_compress=gate_compress)

SPARSE_FP4_COMPRESS_ATTN computes:

output = sparse_fp4_main_branch(q, k, v) + dense_block_mean_branch(q, k, v) * gate_compress

Routing and dropped-tile compensation use high-precision pre-quantized Q/K/V. The selected sparse attention path uses fake-quantized FP4 Q/K/V.

Consumer GPU Notes

The NVFP4 linear path is fake quant PyTorch code, not a fused deployment kernel. The sparse FP4 attention backend depends on the included Triton/block-sparse kernel source. If that backend cannot be built on the target 5090 environment, load the custom transformer and run with a dense attention fallback while keeping the fake-quant linear wrappers.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support