|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: diffusers |
|
|
pipeline_tag: image-to-image |
|
|
tags: |
|
|
- optical-flow prediction |
|
|
- motion prediction |
|
|
- diffusion |
|
|
--- |
|
|
|
|
|
# FOFPred: Language-Driven Future Optical Flow Prediction |
|
|
|
|
|
**FOFPred** is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., *"Moving the water bottle from right to left"*), FOFPred generates 4 sequential optical flow frames showing how objects would move. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import einops |
|
|
import numpy as np |
|
|
import torch |
|
|
from diffusers import DiffusionPipeline |
|
|
from PIL import Image |
|
|
|
|
|
# Load pipeline with trust_remote_code |
|
|
pipeline = DiffusionPipeline.from_pretrained( |
|
|
"Salesforce/FOFPred", |
|
|
torch_dtype=torch.bfloat16, |
|
|
trust_remote_code=True, |
|
|
).to("cuda") |
|
|
|
|
|
# Run inference |
|
|
results = pipeline( |
|
|
prompt="Moving the water bottle from right to left.", |
|
|
input_images=[Image.open("your_image.jpg")], |
|
|
width=256, |
|
|
height=256, |
|
|
num_inference_steps=1, |
|
|
num_images_per_prompt=4, |
|
|
frame_count=4, |
|
|
generator=torch.Generator(device="cuda").manual_seed(42), |
|
|
output_type="pt", |
|
|
) |
|
|
|
|
|
flow_frames = results.images # [B, F, C, H, W] |
|
|
|
|
|
output_tensor = flow_frames[0] # [F, C, H, W] |
|
|
output_np = pipeline.image_processor.pt_to_numpy(output_tensor) # [F, H, W, C] |
|
|
reshaped = einops.rearrange(output_np, "f h w c -> h (f w) c") |
|
|
img = Image.fromarray((reshaped * 255).astype(np.uint8)) |
|
|
img.save("output_combined.png") |
|
|
``` |
|
|
|
|
|
## Architecture |
|
|
|
|
|
| Component | Model | |
|
|
|-----------|-------| |
|
|
| **V-LLM** | Qwen2.5-VL-3B-Instruct | |
|
|
| **DiT** | OmniGen2Transformer3DModel | |
|
|
| **VAE** | FLUX.1-dev AutoencoderKL | |
|
|
| **Scheduler** | FlowMatchEulerDiscreteScheduler | |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
- [OmniGen2](https://github.com/VectorSpaceLab/OmniGen2) |
|
|
- [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL) |
|
|
- [Flux VAE](https://huggingface.co/black-forest-labs/FLUX.1-dev) |
|
|
|
|
|
## License |
|
|
|
|
|
Our code and weights are released under the [CC by-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/deed.en). |
|
|
|