Text-to-Video
Diffusers
Safetensors
English
MotifVideoPipeline
image-to-video
video-generation
diffusion-transformer
Instructions to use Motif-Technologies/Motif-Video-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Motif-Technologies/Motif-Video-2B with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Motif-Technologies/Motif-Video-2B", dtype=torch.bfloat16, device_map="cuda") prompt = "A vibrant blue jay perches gracefully on a slender branch, its feathers shimmering in the soft morning light. The bird's keen eyes scan the surroundings, capturing the essence of the tranquil forest. It flutters its wings briefly, showcasing the intricate patterns of blue, white, and black on its plumage. The background reveals a lush canopy of green leaves, with rays of sunlight filtering through, creating a dappled effect on the forest floor. The blue jay then tilts its head, emitting a melodious call that echoes through the serene woodland, adding a touch of magic to the peaceful scene." image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
File size: 3,101 Bytes
190632e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | # Memory-efficient Inference
> See the main [README](../README.md) for `FlowDPMSolver` and `guider` setup.
By default, `pipe.to("cuda")` loads all components onto the GPU simultaneously, requiring **~30 GB VRAM**.
For GPUs with 24 GB or less (e.g. RTX 4090, RTX 3090), use `enable_model_cpu_offload()` with the `expandable_segments` allocator setting:
```bash
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
```
```python
pipe = MotifVideoPipeline.from_pretrained(
"Motif-Technologies/Motif-Video-2B",
revision="diffusers-integration",
torch_dtype=torch.bfloat16,
guider=guider, # see T2V example above
)
pipe.scheduler = FlowDPMSolver(
num_train_timesteps=pipe.scheduler.config.get("num_train_timesteps", 1000),
algorithm_type="dpmsolver++",
solver_order=2,
prediction_type="flow_prediction",
use_flow_sigmas=True,
flow_shift=15.0,
)
pipe.enable_model_cpu_offload() # replaces pipe.to("cuda")
output = pipe(
prompt="...",
negative_prompt="...",
height=736, width=1280, num_frames=121, num_inference_steps=50,
frame_rate=24, use_linear_quadratic_schedule=False,
)
export_to_video(output.frames[0], "output.mp4", fps=24)
```
This moves each component (text encoder → transformer → VAE) to GPU only when needed. The `expandable_segments` setting allows the CUDA memory allocator to efficiently reuse memory released by earlier components, avoiding fragmentation-related OOM errors.
| Mode | Peak VRAM | Speed | Recommended GPU |
|------|-----------|-------|-----------------|
| `pipe.to("cuda")` | ~30 GB | Fastest | A100, H100, H200 |
| `enable_model_cpu_offload()` | ~19 GB | Similar | RTX 4090, RTX 3090 |
## FP8 Weight Quantization (Optional)
For further VRAM reduction, you can quantize the transformer weights to FP8 using [torchao](https://github.com/pytorch/ao):
```bash
pip install torchao
```
```python
from torchao.quantization import quantize_, Float8WeightOnlyConfig
pipe = MotifVideoPipeline.from_pretrained(
"Motif-Technologies/Motif-Video-2B",
revision="diffusers-integration",
torch_dtype=torch.bfloat16,
guider=guider, # see T2V example above
)
pipe.scheduler = FlowDPMSolver(
num_train_timesteps=pipe.scheduler.config.get("num_train_timesteps", 1000),
algorithm_type="dpmsolver++",
solver_order=2,
prediction_type="flow_prediction",
use_flow_sigmas=True,
flow_shift=15.0,
)
quantize_(pipe.transformer, Float8WeightOnlyConfig())
pipe.enable_model_cpu_offload()
output = pipe(
prompt="...",
negative_prompt="...",
height=736, width=1280, num_frames=121, num_inference_steps=50,
frame_rate=24, use_linear_quadratic_schedule=False,
)
export_to_video(output.frames[0], "output.mp4", fps=24)
```
This stores the transformer weights in FP8 (8-bit) instead of BF16 (16-bit), reducing peak VRAM from ~19 GB to ~15 GB while keeping all computation in BF16 precision.
| Mode | Peak VRAM | Notes |
|------|-----------|-------|
| `enable_model_cpu_offload()` | ~19 GB | BF16 baseline |
| `+ Float8WeightOnlyConfig` | ~15 GB | FP8 weights, BF16 compute |
|