sparse_quant_exp / README.md
yitongl's picture
Document standalone inference helper
6dd6b86 verified
# sfp4_v4_sparse09_hpo_on_ours_p_init2050 checkpoint-700
This upload contains the consolidated WanTransformer3DModel transformer weights
from:
`checkpoints/sfp4_v4_sparse09_hpo_on_ours_p_init2050_1n_interactive/checkpoint-700`
Contents:
- `transformer/config.json`
- `transformer/diffusion_pytorch_model.safetensors`
- `backend_snapshot/`
- `standalone_inference/`
Training run:
- run name: `sfp4_v4_sparse09_hpo_on_ours_p_init2050_1n_interactive`
- source init: `sfp4_v4_sparse06_hpo_on_ours_p_1n_interactive_v2 checkpoint-2050`
- attention backend: `SPARSE_FP4_OURS_P_ATTN`
- high precision output for backward: enabled
- VSA sparsity: `0.9`
This package does not include the distributed optimizer/training-state
checkpoint. Use the original `distributed_checkpoint/` directory if exact
training resume state is required.
`backend_snapshot/` contains the local FastVideo backend code used by this
checkpoint, including `SPARSE_FP4_OURS_P_ATTN`, its Triton forward/backward
kernel, FP4 quant helpers, VSA metadata helper, backend wiring, and the exact
SFT launch scripts.
It also includes the inference entrypoint snapshot and an example script:
- `backend_snapshot/scripts/inference/run_sfp4_ours_p_checkpoint_700.sh`
- `backend_snapshot/training_attention_settings.json`
- `standalone_inference/`
Attention setup for this checkpoint:
- self-attention: `SPARSE_FP4_OURS_P_ATTN`, FP4 Q/K/V, sparse 64-token VSA
tiles, group-local P quant, dropped-tile mean compensation
- cross-attention: dense SDPA fallback, not FP4/sparse
- force-dense paths: dense SDPA
`standalone_inference/` is a portable helper for normal inference. It contains
an overlay installer, a runner that downloads/loads the checkpoint-700
transformer weights, and the sparse FP4 backend files required by this
checkpoint.