Instructions to use Motif-Technologies/Motif-Video-2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Motif-Technologies/Motif-Video-2B with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Motif-Technologies/Motif-Video-2B", dtype=torch.bfloat16, device_map="cuda") prompt = "A vibrant blue jay perches gracefully on a slender branch, its feathers shimmering in the soft morning light. The bird's keen eyes scan the surroundings, capturing the essence of the tranquil forest. It flutters its wings briefly, showcasing the intricate patterns of blue, white, and black on its plumage. The background reveals a lush canopy of green leaves, with rays of sunlight filtering through, creating a dappled effect on the forest floor. The blue jay then tilts its head, emitting a melodious call that echoes through the serene woodland, adding a touch of magic to the peaceful scene." image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
docs: add RTX 4090 benchmark + GPU arch list for SageAttention build
#22
by gkalstn0 - opened
- README.md +1 -0
- docs/gguf-sageattention.md +36 -2
README.md
CHANGED
|
@@ -64,6 +64,7 @@ widget:
|
|
| 64 |
|
| 65 |
## π₯ News
|
| 66 |
|
|
|
|
| 67 |
- **[2026-04-28]** **ComfyUI custom nodes** released: [ComfyUI-MotifVideo2B](https://github.com/MotifTechnologies/ComfyUI-MotifVideo2B). GGUF workflow support coming soon.
|
| 68 |
- **[2026-04-28]** **GGUF quantized weights** now available at [Motif-Video-2B-GGUF](https://huggingface.co/Motif-Technologies/Motif-Video-2B-GGUF) β up to 2.7 GB VRAM savings with no speed penalty. **SageAttention** support for ~2Γ faster inference. See [GGUF + SageAttention](#π§-gguf--sageattention) below.
|
| 69 |
- **[2026-04-14]** We release **Motif-Video 2B**, our 2B-parameter text-to-video and image-to-video diffusion transformer, together with the full [technical report](https://arxiv.org/abs/2604.16503).
|
|
|
|
| 64 |
|
| 65 |
## π₯ News
|
| 66 |
|
| 67 |
+
- **[2026-04-29]** **RTX 4090 benchmarks** added β SageAttention achieves ~3.16Γ speedup, all GGUF variants fit in 24 GB. See [GGUF + SageAttention](docs/gguf-sageattention.md#benchmark).
|
| 68 |
- **[2026-04-28]** **ComfyUI custom nodes** released: [ComfyUI-MotifVideo2B](https://github.com/MotifTechnologies/ComfyUI-MotifVideo2B). GGUF workflow support coming soon.
|
| 69 |
- **[2026-04-28]** **GGUF quantized weights** now available at [Motif-Video-2B-GGUF](https://huggingface.co/Motif-Technologies/Motif-Video-2B-GGUF) β up to 2.7 GB VRAM savings with no speed penalty. **SageAttention** support for ~2Γ faster inference. See [GGUF + SageAttention](#π§-gguf--sageattention) below.
|
| 70 |
- **[2026-04-14]** We release **Motif-Video 2B**, our 2B-parameter text-to-video and image-to-video diffusion transformer, together with the full [technical report](https://arxiv.org/abs/2604.16503).
|
docs/gguf-sageattention.md
CHANGED
|
@@ -90,7 +90,12 @@ Same prompt and seed, 1280x736, 121 frames, 50 steps. Left = SDPA, Right = SageA
|
|
| 90 |
**Install** (build from source β PyPI only has 1.x, need 2.x):
|
| 91 |
|
| 92 |
```bash
|
| 93 |
-
# Set TORCH_CUDA_ARCH_LIST to match your GPU:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
TORCH_CUDA_ARCH_LIST="9.0" pip install git+https://github.com/thu-ml/SageAttention.git --no-build-isolation
|
| 95 |
```
|
| 96 |
|
|
@@ -108,7 +113,7 @@ python inference.py --use-sage-attention --prompt "..."
|
|
| 108 |
- Set `TORCH_CUDA_ARCH_LIST` to match your GPU when building (e.g., `"8.6"` for RTX 3090, `"8.9"` for RTX 4090)
|
| 109 |
- No quality degradation observed across all GGUF variants
|
| 110 |
|
| 111 |
-
## Benchmark
|
| 112 |
|
| 113 |
Measured on NVIDIA H200, 1280x736, 121 frames, 50 steps, DPMSolver++ (order=2, flow_shift=15.0):
|
| 114 |
|
|
@@ -130,3 +135,32 @@ Peak alloc/rsv columns show SDPA / Sage values. Sage adds ~0.3 GB alloc overhead
|
|
| 130 |
- **~1.59x faster with SageAttention** β consistent across all quantization levels
|
| 131 |
- **VRAM unchanged** β sage overhead is negligible (~0.3 GB alloc)
|
| 132 |
- **GGUF + Sage stacks** β Q4_K_M + Sage achieves 14.59 s/it at 12.53 GB alloc (vs BF16 SDPA: 23.36 s/it at 14.78 GB)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
**Install** (build from source β PyPI only has 1.x, need 2.x):
|
| 91 |
|
| 92 |
```bash
|
| 93 |
+
# Set TORCH_CUDA_ARCH_LIST to match your GPU:
|
| 94 |
+
# "8.0" for A100/A30
|
| 95 |
+
# "8.6" for RTX 3090/3080/A40
|
| 96 |
+
# "8.9" for RTX 4090/4080/4070 Ti/L40/L40S (Ada Lovelace)
|
| 97 |
+
# "10.0" for RTX 5090/5080/5070 Ti (Blackwell)
|
| 98 |
+
# "9.0" for H100/H200
|
| 99 |
TORCH_CUDA_ARCH_LIST="9.0" pip install git+https://github.com/thu-ml/SageAttention.git --no-build-isolation
|
| 100 |
```
|
| 101 |
|
|
|
|
| 113 |
- Set `TORCH_CUDA_ARCH_LIST` to match your GPU when building (e.g., `"8.6"` for RTX 3090, `"8.9"` for RTX 4090)
|
| 114 |
- No quality degradation observed across all GGUF variants
|
| 115 |
|
| 116 |
+
## Benchmark (H200)
|
| 117 |
|
| 118 |
Measured on NVIDIA H200, 1280x736, 121 frames, 50 steps, DPMSolver++ (order=2, flow_shift=15.0):
|
| 119 |
|
|
|
|
| 135 |
- **~1.59x faster with SageAttention** β consistent across all quantization levels
|
| 136 |
- **VRAM unchanged** β sage overhead is negligible (~0.3 GB alloc)
|
| 137 |
- **GGUF + Sage stacks** β Q4_K_M + Sage achieves 14.59 s/it at 12.53 GB alloc (vs BF16 SDPA: 23.36 s/it at 14.78 GB)
|
| 138 |
+
|
| 139 |
+
---
|
| 140 |
+
|
| 141 |
+
## Benchmark (RTX 4090)
|
| 142 |
+
|
| 143 |
+
Measured on NVIDIA RTX 4090 (24 GB), 1280x736, 121 frames, 50 steps, DPMSolver++ (order=2, flow_shift=15.0):
|
| 144 |
+
|
| 145 |
+
**Environment:** NGC `nvcr.io/nvidia/pytorch:26.01-py3`, Python 3.12.3, PyTorch 2.11.0+cu130, CUDA 13.0.
|
| 146 |
+
SageAttention built from source with `TORCH_CUDA_ARCH_LIST="8.9"`.
|
| 147 |
+
|
| 148 |
+
| Variant | SDPA (s/it) | Sage (s/it) | Speedup | Peak alloc (GB) | Total SDPA (s) | Total Sage (s) |
|
| 149 |
+
|---------|------------|------------|---------|-----------------|----------------|----------------|
|
| 150 |
+
| BF16 | 92.54 | 29.17 | 3.17x | 14.73 | 4665 | 1492 |
|
| 151 |
+
| Q8_0 | 92.51 | 29.18 | 3.17x | 13.02 | 4658 | 1493 |
|
| 152 |
+
| Q6_K | 92.81 | 29.41 | 3.16x | 12.58 | 4673 | 1504 |
|
| 153 |
+
| Q5_K_M | 92.79 | 29.43 | 3.15x | 12.36 | 4672 | 1505 |
|
| 154 |
+
| Q5_1 | 92.67 | 29.34 | 3.16x | 12.45 | 4667 | 1501 |
|
| 155 |
+
| Q5_0 | 92.64 | 29.34 | 3.16x | 12.34 | 4664 | 1500 |
|
| 156 |
+
| Q4_K_M | 92.62 | 29.29 | 3.16x | 12.16 | 4665 | 1502 |
|
| 157 |
+
| Q4_1 | 92.60 | 29.32 | 3.16x | 12.22 | 4668 | 1499 |
|
| 158 |
+
| Q4_0 | 92.64 | 29.32 | 3.16x | 12.11 | 4684 | 1500 |
|
| 159 |
+
|
| 160 |
+
Peak alloc is identical for SDPA/Sage (SageAttention adds no extra alloc overhead on RTX 4090). Peak reserved is ~14 GB with SDPA and ~16 GB with Sage.
|
| 161 |
+
|
| 162 |
+
**Key findings (RTX 4090):**
|
| 163 |
+
- **~3.16x faster with SageAttention** β SM89 FP16 kernels deliver larger relative speedup than H200's FP8 kernels (3.16x vs 1.59x) because SDPA is slower on 4090 while Sage remains fast
|
| 164 |
+
- **All variants fit in 24 GB** β Q4_0 + Sage peaks at 12.11 GB alloc (~16 GB reserved)
|
| 165 |
+
- **GGUF + Sage stacks** β Q4_K_M + Sage: 29.29 s/it at 12.16 GB (vs BF16 SDPA: 92.54 s/it at 14.73 GB)
|
| 166 |
+
- **No quality degradation** β identical to SDPA outputs across all variants
|