YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
FastVideo 5090 Test Checkpoint
This repository is an HF/diffusers-style export of the FastVideo DMD2 student checkpoint at step 2950.
The model is packaged as safetensors for easier loading. It is not a vanilla
diffusers Wan checkpoint: the transformer uses a custom Wan block with
to_gate_compress, SPARSE_FP4_COMPRESS_ATTN, and NVFP4 fake-quant linear
wrappers. Use the matching FastVideo code or the included custom_code/ files.
Files
model_index.json,scheduler/,text_encoder/,tokenizer/,vae/: copied fromFastWan2.1-T2V-1.3B-Diffusers.transformer/diffusion_pytorch_model.safetensors: exported student transformer weights fromcheckpoint-2950.custom_code/: standalone source for the attention backend, gate model structure, NVFP4 fake quant linear, and required registry points.training_metadata/: original checkpoint metadata and launch scripts.
Training/Export Setting
- Source checkpoint:
dmd2_fastwan_student_sfp4_compress_s08_nvfp4_qat_gatefp4_linear_criticwarm0_32g_batch_short_array/checkpoint-2950 - Student init:
models/FastWan2.1-T2V-1.3B-Diffusers - Student attention:
SPARSE_FP4_COMPRESS_ATTN - Sparsity:
0.8 - Student linear: NVFP4 fake-quant QAT
- Gate-compress projection: enabled and NVFP4 fake-quant QAT
- LoRA: disabled
- Teacher/critic: dense bf16 linear with
FLASH_ATTNconfig - DMD2 timesteps:
[1000, 757, 522] - Validation setting: 3-step DMD, guidance scale
6.0, flow shift8
Solver For Reproduction
Do not reproduce this checkpoint with the vanilla scheduler/ entry in
model_index.json. That copied base-model scheduler is
UniPCMultistepScheduler with solver_type=bh2, but it is not the online
validation solver used for this checkpoint.
The matching deployment path is:
- pipeline class:
fastvideo.pipelines.basic.wan.wan_dmd_pipeline.WanDMDPipeline - denoising stage:
fastvideo.pipelines.stages.denoising.DmdDenoisingStage - scheduler class:
FlowMatchEulerDiscreteScheduler - solver mode: DMD few-step pred-video + re-noise loop
- scheduler shift:
8.0 - pipeline
flow_shift:8 - DMD timesteps:
[1000, 757, 522] - number of denoising steps:
3 - guidance scale:
6.0 - target dtype inside DMD denoising:
bfloat16 - attention backend:
SPARSE_FP4_COMPRESS_ATTN - attention sparsity:
0.8 - high precision O backward flag used in training/eval:
FASTVIDEO_SPARSE_FP4_USE_HIGH_PREC_O=1
The DMD step does not call scheduler.step(...) like a normal Euler/UniPC
diffusion loop. For each fixed timestep it predicts noise, converts it to
predicted clean video with pred_noise_to_pred_video(...), and, except for the
last step, re-noises that predicted video to the next fixed timestep with
FlowMatchEulerDiscreteScheduler.add_noise(...).
The exact deployment config is in
training_metadata/deployment_repro_config.json. The relevant solver source is
also included under custom_code/fastvideo/pipelines/basic/wan/wan_dmd_pipeline.py,
custom_code/fastvideo/pipelines/stages/denoising.py, and
custom_code/fastvideo/models/schedulers/scheduling_flow_match_euler_discrete.py.
Custom Architecture
The Wan self-attention block adds:
self.to_gate_compress = ReplicatedLinear(dim, dim, bias=True, ...)
Forward computes the gate from the same normalized hidden states as Q/K/V:
gate_compress, _ = self.to_gate_compress(norm_hidden_states)
gate_compress = gate_compress.squeeze(1).unflatten(2, (num_heads, -1))
attn_output, _ = self.attn1(..., gate_compress=gate_compress)
SPARSE_FP4_COMPRESS_ATTN computes:
output = sparse_fp4_main_branch(q, k, v) + dense_block_mean_branch(q, k, v) * gate_compress
Routing and dropped-tile compensation use high-precision pre-quantized Q/K/V. The selected sparse attention path uses fake-quantized FP4 Q/K/V.
Consumer GPU Notes
The NVFP4 linear path is fake quant PyTorch code, not a fused deployment kernel. The sparse FP4 attention backend depends on the included Triton/block-sparse kernel source. If that backend cannot be built on the target 5090 environment, load the custom transformer and run with a dense attention fallback while keeping the fake-quant linear wrappers.
- Downloads last month
- -