FastVideo 5090 Test Checkpoint

This repository is an HF/diffusers-style export of the FastVideo DMD2 student checkpoint at step 2950.

The model is packaged as safetensors for easier loading. It is not a vanilla diffusers Wan checkpoint: the transformer uses a custom Wan block with to_gate_compress, SPARSE_FP4_COMPRESS_ATTN, and NVFP4 fake-quant linear wrappers. Use the matching FastVideo code or the included custom_code/ files.

Files

model_index.json, scheduler/, text_encoder/, tokenizer/, vae/: copied from FastWan2.1-T2V-1.3B-Diffusers.
transformer/diffusion_pytorch_model.safetensors: exported student transformer weights from checkpoint-2950.
custom_code/: standalone source for the attention backend, gate model structure, NVFP4 fake quant linear, and required registry points.
training_metadata/: original checkpoint metadata and launch scripts.

Training/Export Setting

Source checkpoint: dmd2_fastwan_student_sfp4_compress_s08_nvfp4_qat_gatefp4_linear_criticwarm0_32g_batch_short_array/checkpoint-2950
Student init: models/FastWan2.1-T2V-1.3B-Diffusers
Student attention: SPARSE_FP4_COMPRESS_ATTN
Sparsity: 0.8
Student linear: NVFP4 fake-quant QAT
Gate-compress projection: enabled and NVFP4 fake-quant QAT
LoRA: disabled
Teacher/critic: dense bf16 linear with FLASH_ATTN config
DMD2 timesteps: [1000, 757, 522]
Validation setting: 3-step DMD, guidance scale 6.0, flow shift 8

Solver For Reproduction

Do not reproduce this checkpoint with the vanilla scheduler/ entry in model_index.json. That copied base-model scheduler is UniPCMultistepScheduler with solver_type=bh2, but it is not the online validation solver used for this checkpoint.

The matching deployment path is:

pipeline class: fastvideo.pipelines.basic.wan.wan_dmd_pipeline.WanDMDPipeline
denoising stage: fastvideo.pipelines.stages.denoising.DmdDenoisingStage
scheduler class: FlowMatchEulerDiscreteScheduler
solver mode: DMD few-step pred-video + re-noise loop
scheduler shift: 8.0
pipeline flow_shift: 8
DMD timesteps: [1000, 757, 522]
number of denoising steps: 3
guidance scale: 6.0
target dtype inside DMD denoising: bfloat16
attention backend: SPARSE_FP4_COMPRESS_ATTN
attention sparsity: 0.8
high precision O backward flag used in training/eval: FASTVIDEO_SPARSE_FP4_USE_HIGH_PREC_O=1

The DMD step does not call scheduler.step(...) like a normal Euler/UniPC diffusion loop. For each fixed timestep it predicts noise, converts it to predicted clean video with pred_noise_to_pred_video(...), and, except for the last step, re-noises that predicted video to the next fixed timestep with FlowMatchEulerDiscreteScheduler.add_noise(...).

The exact deployment config is in training_metadata/deployment_repro_config.json. The relevant solver source is also included under custom_code/fastvideo/pipelines/basic/wan/wan_dmd_pipeline.py, custom_code/fastvideo/pipelines/stages/denoising.py, and custom_code/fastvideo/models/schedulers/scheduling_flow_match_euler_discrete.py.

Custom Architecture

The Wan self-attention block adds:

self.to_gate_compress = ReplicatedLinear(dim, dim, bias=True, ...)

Forward computes the gate from the same normalized hidden states as Q/K/V:

gate_compress, _ = self.to_gate_compress(norm_hidden_states)
gate_compress = gate_compress.squeeze(1).unflatten(2, (num_heads, -1))
attn_output, _ = self.attn1(..., gate_compress=gate_compress)

SPARSE_FP4_COMPRESS_ATTN computes:

output = sparse_fp4_main_branch(q, k, v) + dense_block_mean_branch(q, k, v) * gate_compress

Routing and dropped-tile compensation use high-precision pre-quantized Q/K/V. The selected sparse attention path uses fake-quantized FP4 Q/K/V.

Consumer GPU Notes

The NVFP4 linear path is fake quant PyTorch code, not a fused deployment kernel. The sparse FP4 attention backend depends on the included Triton/block-sparse kernel source. If that backend cannot be built on the target 5090 environment, load the custom transformer and run with a dense attention fallback while keeping the fake-quant linear wrappers.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support