File size: 2,098 Bytes
e5e3587 abdd865 e5e3587 eb214b7 e5e3587 eb214b7 e5e3587 eb214b7 e5e3587 eb214b7 e5e3587 eb214b7 e5e3587 eb214b7 e5e3587 d9904e6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
---
license: apache-2.0
library_name: diffusers
pipeline_tag: image-to-image
tags:
- optical-flow prediction
- motion prediction
- diffusion
---
# FOFPred: Language-Driven Future Optical Flow Prediction
**FOFPred** is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., *"Moving the water bottle from right to left"*), FOFPred generates 4 sequential optical flow frames showing how objects would move.
## Usage
```python
import einops
import numpy as np
import torch
from diffusers import DiffusionPipeline
from PIL import Image
# Load pipeline with trust_remote_code
pipeline = DiffusionPipeline.from_pretrained(
"Salesforce/FOFPred",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).to("cuda")
# Run inference
results = pipeline(
prompt="Moving the water bottle from right to left.",
input_images=[Image.open("your_image.jpg")],
width=256,
height=256,
num_inference_steps=1,
num_images_per_prompt=4,
frame_count=4,
generator=torch.Generator(device="cuda").manual_seed(42),
output_type="pt",
)
flow_frames = results.images # [B, F, C, H, W]
output_tensor = flow_frames[0] # [F, C, H, W]
output_np = pipeline.image_processor.pt_to_numpy(output_tensor) # [F, H, W, C]
reshaped = einops.rearrange(output_np, "f h w c -> h (f w) c")
img = Image.fromarray((reshaped * 255).astype(np.uint8))
img.save("output_combined.png")
```
## Architecture
| Component | Model |
|-----------|-------|
| **V-LLM** | Qwen2.5-VL-3B-Instruct |
| **DiT** | OmniGen2Transformer3DModel |
| **VAE** | FLUX.1-dev AutoencoderKL |
| **Scheduler** | FlowMatchEulerDiscreteScheduler |
## Acknowledgements
- [OmniGen2](https://github.com/VectorSpaceLab/OmniGen2)
- [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL)
- [Flux VAE](https://huggingface.co/black-forest-labs/FLUX.1-dev)
## License
Our code and weights are released under the [CC by-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/deed.en).
|