File size: 1,820 Bytes
7a2615b 17aaec0 7a2615b 17aaec0 7a2615b 17aaec0 7a2615b 17aaec0 7a2615b 17aaec0 7a2615b 17aaec0 7a2615b 17aaec0 2a6ac3e 17aaec0 7a2615b 17aaec0 7a2615b 17aaec0 7a2615b 17aaec0 7a2615b 17aaec0 7a2615b 17aaec0 7a2615b 17aaec0 7a2615b 17aaec0 7a2615b 17aaec0 7a2615b 17aaec0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-image
tags:
- optical-flow prediction
- motion prediction
- diffusion
---
# FOFPred: Language-Driven Future Optical Flow Prediction
**FOFPred** is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., *"Moving the water bottle from right to left"*), FOFPred generates 4 sequential optical flow frames showing how objects would move.
## Usage
```python
import torch
from fofpred.pipelines.fofpred.pipeline_fofpred import FOFPredPipeline
from fofpred.schedulers.scheduling_flow_match_euler_discrete import FlowMatchEulerDiscreteScheduler
from PIL import Image
pipeline = FOFPredPipeline.from_pretrained(
"Salesforce/FOFPred",
torch_dtype=torch.bfloat16,
).to("cuda")
pipeline.scheduler = FlowMatchEulerDiscreteScheduler()
results = pipeline(
prompt="Moving the water bottle from right to left.",
input_images=[Image.open("your_image.jpg")],
width=256,
height=256,
num_inference_steps=1,
num_images_per_prompt=4,
frame_count=4,
generator=torch.Generator(device="cuda").manual_seed(42),
output_type="pt",
)
flow_frames = results.images # [B, F, C, H, W]
```
## Architecture
| Component | Model |
|-----------|-------|
| **V-LLM** | Qwen2.5-VL-3B-Instruct |
| **DiT** | OmniGen2Transformer3DModel |
| **VAE** | FLUX.1-dev AutoencoderKL |
| **Scheduler** | FlowMatchEulerDiscreteScheduler |
## Acknowledgements
- [OmniGen2](https://github.com/VectorSpaceLab/OmniGen2)
- [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL)
- [Flux VAE](https://huggingface.co/black-forest-labs/FLUX.1-dev)
## License
Apache 2.0 — Copyright (c) 2025 Salesforce, Inc.
|