Salesforce
/

FOFPred

optical-flow prediction

motion prediction

Model card Files Files and versions

FOFPred / README.md

jimjag-sf's picture

Upload FOFPred pipeline (#8)

d9904e6 verified 10 days ago

|

history blame contribute delete

2.1 kB

	---
	license: apache-2.0
	library_name: diffusers
	pipeline_tag: image-to-image
	tags:
	- optical-flow prediction
	- motion prediction
	- diffusion
	---

	# FOFPred: Language-Driven Future Optical Flow Prediction

	FOFPred is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., "Moving the water bottle from right to left"), FOFPred generates 4 sequential optical flow frames showing how objects would move.

	## Usage

	```python
	import einops
	import numpy as np
	import torch
	from diffusers import DiffusionPipeline
	from PIL import Image

	# Load pipeline with trust_remote_code
	pipeline = DiffusionPipeline.from_pretrained(
	"Salesforce/FOFPred",
	torch_dtype=torch.bfloat16,
	trust_remote_code=True,
	).to("cuda")

	# Run inference
	results = pipeline(
	prompt="Moving the water bottle from right to left.",
	input_images=[Image.open("your_image.jpg")],
	width=256,
	height=256,
	num_inference_steps=1,
	num_images_per_prompt=4,
	frame_count=4,
	generator=torch.Generator(device="cuda").manual_seed(42),
	output_type="pt",
	)

	flow_frames = results.images # [B, F, C, H, W]

	output_tensor = flow_frames[0] # [F, C, H, W]
	output_np = pipeline.image_processor.pt_to_numpy(output_tensor) # [F, H, W, C]
	reshaped = einops.rearrange(output_np, "f h w c -> h (f w) c")
	img = Image.fromarray((reshaped * 255).astype(np.uint8))
	img.save("output_combined.png")
	```

	## Architecture

	\| Component \| Model \|
	\|-----------\|-------\|
	\| V-LLM \| Qwen2.5-VL-3B-Instruct \|
	\| DiT \| OmniGen2Transformer3DModel \|
	\| VAE \| FLUX.1-dev AutoencoderKL \|
	\| Scheduler \| FlowMatchEulerDiscreteScheduler \|

	## Acknowledgements

	- [OmniGen2](https://github.com/VectorSpaceLab/OmniGen2)
	- [Qwen2.5-VL](https://huggingface.co/Qwen/Qwen2.5-VL)
	- [Flux VAE](https://huggingface.co/black-forest-labs/FLUX.1-dev)

	## License

	Our code and weights are released under the [CC by-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/deed.en).