Instructions to use stepfun-ai/stepvideo-t2v with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use stepfun-ai/stepvideo-t2v with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("stepfun-ai/stepvideo-t2v", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Lower VRAM Mode or CPU Support for Step-Video-T2V?
#3
by HassanStar - opened
Hey StepFun team,
Step-Video-T2V looks incredible, but the high VRAM requirements (77GB for 204 frames) make it difficult for many users to run.
Are there any plans for a low-VRAM mode, quantized version, or even a CPU-compatible variant for research and experimentation?
Would love to hear if optimizations like Mixture of Experts (MoE), FP8 compression, or distillation techniques are being explored.
Thanks for your amazing work!
also will future versions support longer videos beyond 204 frames?
The Modelscope community has developed an FP8 inference framework. Although it is still in the early stages, you can give it a try.
See it here: step-video-t2v-in-fp8.