File size: 1,820 Bytes

7a2615b
17aaec0
7a2615b
17aaec0
 
 
 
 
7a2615b
 
17aaec0
7a2615b
17aaec0
7a2615b
17aaec0
7a2615b
17aaec0
 
 
 
 
7a2615b
17aaec0
2a6ac3e
17aaec0
 
7a2615b
17aaec0
7a2615b
17aaec0
 
 
 
 
 
 
 
 
 
 
7a2615b
17aaec0
 
7a2615b
17aaec0
7a2615b
17aaec0
 
 
 
 
 
7a2615b
17aaec0
7a2615b
17aaec0
 
 
7a2615b
17aaec0
7a2615b
17aaec0

---
license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-image
tags:
  - optical-flow prediction
  - motion prediction
  - diffusion
---

# FOFPred: Language-Driven Future Optical Flow Prediction

**FOFPred** is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., *"Moving the water bottle from right to left"*), FOFPred generates 4 sequential optical flow frames showing how objects would move.

## Usage

```python
import torch
from fofpred.pipelines.fofpred.pipeline_fofpred import FOFPredPipeline
from fofpred.schedulers.scheduling_flow_match_euler_discrete import FlowMatchEulerDiscreteScheduler
from PIL import Image

pipeline = FOFPredPipeline.from_pretrained(
    "Salesforce/FOFPred",
    torch_dtype=torch.bfloat16,
).to("cuda")

pipeline.scheduler = FlowMatchEulerDiscreteScheduler()

results = pipeline(
    prompt="Moving the water bottle from right to left.",
    input_images=[Image.open("your_image.jpg")],
    width=256,
    height=256,
    num_inference_steps=1,
    num_images_per_prompt=4,
    frame_count=4,
    generator=torch.Generator(device="cuda").manual_seed(42),
    output_type="pt",
)

flow_frames = results.images  # [B, F, C, H, W]
```

## Architecture

| Component | Model |
|-----------|-------|
| **V-LLM** | Qwen2.5-VL-3B-Instruct |
| **DiT** | OmniGen2Transformer3DModel |
| **VAE** | FLUX.1-dev AutoencoderKL |
| **Scheduler** | FlowMatchEulerDiscreteScheduler |

## Acknowledgements

- [OmniGen2](https://github.com/VectorSpaceLab/OmniGen2)
- [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL)
- [Flux VAE](https://huggingface.co/black-forest-labs/FLUX.1-dev)

## License

Apache 2.0 — Copyright (c) 2025 Salesforce, Inc.