Add FP8 weight quantization guide to README

#9
by gkalstn0 - opened
Motif Technologies org

Summary

  • Add torchao Float8WeightOnlyConfig instructions to Memory-efficient Inference section
  • Reduces peak VRAM from ~19 GB to ~15 GB with enable_model_cpu_offload()
  • Stores transformer weights in FP8 while keeping all computation in BF16 precision

Test plan

  • Fresh venv with README pip install recipe + torchao
  • 720p 121 frames 50 steps: VRAM confirmed ~15 GB (vs ~19 GB baseline)
  • Video output quality verified
gkalstn0 changed pull request status to open
gkalstn0 changed pull request status to merged

Sign up or log in to comment