|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: diffusers |
|
|
pipeline_tag: image-to-image |
|
|
tags: |
|
|
- optical-flow prediction |
|
|
- motion prediction |
|
|
- diffusion |
|
|
--- |
|
|
|
|
|
# FOFPred: Language-Driven Future Optical Flow Prediction |
|
|
|
|
|
**FOFPred** is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., *"Moving the water bottle from right to left"*), FOFPred generates 4 sequential optical flow frames showing how objects would move. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from fofpred.pipelines.fofpred.pipeline_fofpred import FOFPredPipeline |
|
|
from fofpred.schedulers.scheduling_flow_match_euler_discrete import FlowMatchEulerDiscreteScheduler |
|
|
from PIL import Image |
|
|
|
|
|
pipeline = FOFPredPipeline.from_pretrained( |
|
|
"Salesforce/FOFPred", |
|
|
torch_dtype=torch.bfloat16, |
|
|
).to("cuda") |
|
|
|
|
|
pipeline.scheduler = FlowMatchEulerDiscreteScheduler() |
|
|
|
|
|
results = pipeline( |
|
|
prompt="Moving the water bottle from right to left.", |
|
|
input_images=[Image.open("your_image.jpg")], |
|
|
width=256, |
|
|
height=256, |
|
|
num_inference_steps=1, |
|
|
num_images_per_prompt=4, |
|
|
frame_count=4, |
|
|
generator=torch.Generator(device="cuda").manual_seed(42), |
|
|
output_type="pt", |
|
|
) |
|
|
|
|
|
flow_frames = results.images # [B, F, C, H, W] |
|
|
``` |
|
|
|
|
|
## Architecture |
|
|
|
|
|
| Component | Model | |
|
|
|-----------|-------| |
|
|
| **V-LLM** | Qwen2.5-VL-3B-Instruct | |
|
|
| **DiT** | OmniGen2Transformer3DModel | |
|
|
| **VAE** | FLUX.1-dev AutoencoderKL | |
|
|
| **Scheduler** | FlowMatchEulerDiscreteScheduler | |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
- [OmniGen2](https://github.com/VectorSpaceLab/OmniGen2) |
|
|
- [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL) |
|
|
- [Flux VAE](https://huggingface.co/black-forest-labs/FLUX.1-dev) |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 — Copyright (c) 2025 Salesforce, Inc. |
|
|
|