Improve model card: add paper link, project page, and update metadata
Browse filesHi! I'm Niels from the community science team at Hugging Face.
I've opened this PR to improve the model card for **FOFPred**. Specifically:
- Linked the model to the paper: [Future Optical Flow Prediction Improves Robot Control & Video Generation](https://huggingface.co/papers/2601.10781).
- Added a link to the project page and the GitHub repository.
- Updated the `pipeline_tag` to `image-to-video` as the model generates a sequence of flow frames.
- Added a BibTeX citation for the work.
Please review and merge this if it looks good!
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
library_name: diffusers
|
| 4 |
-
|
|
|
|
| 5 |
tags:
|
| 6 |
- optical-flow prediction
|
| 7 |
- motion prediction
|
|
@@ -10,7 +10,9 @@ tags:
|
|
| 10 |
|
| 11 |
# FOFPred: Language-Driven Future Optical Flow Prediction
|
| 12 |
|
| 13 |
-
**FOFPred** is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., *"Moving the water bottle from right to left"*), FOFPred generates 4 sequential optical flow frames showing how objects would move.
|
|
|
|
|
|
|
| 14 |
|
| 15 |
## Usage
|
| 16 |
|
|
@@ -52,12 +54,23 @@ img.save("output_combined.png")
|
|
| 52 |
|
| 53 |
## Architecture
|
| 54 |
|
| 55 |
-
| Component | Model |
|
| 56 |
-
|
| 57 |
-
| **V-LLM** | Qwen2.5-VL-3B-Instruct |
|
| 58 |
-
| **DiT** | OmniGen2Transformer3DModel |
|
| 59 |
-
| **VAE** | FLUX.1-dev AutoencoderKL |
|
| 60 |
-
| **Scheduler** | FlowMatchEulerDiscreteScheduler |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
## Acknowledgements
|
| 63 |
|
|
@@ -67,4 +80,4 @@ img.save("output_combined.png")
|
|
| 67 |
|
| 68 |
## License
|
| 69 |
|
| 70 |
-
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
library_name: diffusers
|
| 3 |
+
license: apache-2.0
|
| 4 |
+
pipeline_tag: image-to-video
|
| 5 |
tags:
|
| 6 |
- optical-flow prediction
|
| 7 |
- motion prediction
|
|
|
|
| 10 |
|
| 11 |
# FOFPred: Language-Driven Future Optical Flow Prediction
|
| 12 |
|
| 13 |
+
**FOFPred** is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., *"Moving the water bottle from right to left"*), FOFPred generates 4 sequential optical flow frames showing how objects would move to accomplish that action.
|
| 14 |
+
|
| 15 |
+
[Paper](https://huggingface.co/papers/2601.10781) | [Project Page](https://fofpred.github.io) | [GitHub](https://github.com/SalesforceAIResearch/FOFPred)
|
| 16 |
|
| 17 |
## Usage
|
| 18 |
|
|
|
|
| 54 |
|
| 55 |
## Architecture
|
| 56 |
|
| 57 |
+
| Component | Model | Description |
|
| 58 |
+
|-----------|-------|-------------|
|
| 59 |
+
| **V-LLM** | Qwen2.5-VL-3B-Instruct | Multimodal understanding of images and text |
|
| 60 |
+
| **DiT** | OmniGen2Transformer3DModel | Modification of OmniGen2Transformer to generate frame sequences |
|
| 61 |
+
| **VAE** | FLUX.1-dev AutoencoderKL | VAE (AutoencoderKL model) |
|
| 62 |
+
| **Scheduler** | FlowMatchEulerDiscreteScheduler | Efficient flow-matching sampler |
|
| 63 |
+
|
| 64 |
+
## Citation
|
| 65 |
+
|
| 66 |
+
```bibtex
|
| 67 |
+
@article{ranasinghe2025future,
|
| 68 |
+
title={Future Optical Flow Prediction Improves Robot Control & Video Generation},
|
| 69 |
+
author={Ranasinghe, Kanchana and Zhou, Honglu and Fang, Yu and Yang, Luyu and Xue, Le and Xu, Ran and Xiong, Caiming and Savarese, Silvio and Ryoo, Michael S and Niebles, Juan Carlos},
|
| 70 |
+
journal={arXiv preprint arXiv:2601.10781},
|
| 71 |
+
year={2025}
|
| 72 |
+
}
|
| 73 |
+
```
|
| 74 |
|
| 75 |
## Acknowledgements
|
| 76 |
|
|
|
|
| 80 |
|
| 81 |
## License
|
| 82 |
|
| 83 |
+
The code and weights in this repository are released under the [Apache License 2.0](https://github.com/SalesforceAIResearch/FOFPred/blob/main/LICENSE.txt). (Note: Some documentation may refer to CC BY-NC 4.0).
|