Improve model card: add paper link, project page, and update metadata

Hi! I'm Niels from the community science team at Hugging Face.

I've opened this PR to improve the model card for **FOFPred**. Specifically:
- Linked the model to the paper: [Future Optical Flow Prediction Improves Robot Control & Video Generation](https://huggingface.co/papers/2601.10781).
- Added a link to the project page and the GitHub repository.
- Updated the `pipeline_tag` to `image-to-video` as the model generates a sequence of flow frames.
- Added a BibTeX citation for the work.

Please review and merge this if it looks good!

Files changed (1) hide show

README.md +23 -10

README.md CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-license: apache-2.0
 library_name: diffusers
-pipeline_tag: image-to-image
 tags:
 - optical-flow prediction
 - motion prediction
@@ -10,7 +10,9 @@ tags:
 # FOFPred: Language-Driven Future Optical Flow Prediction
-**FOFPred** is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., *"Moving the water bottle from right to left"*), FOFPred generates 4 sequential optical flow frames showing how objects would move.
 ## Usage
@@ -52,12 +54,23 @@ img.save("output_combined.png")
 ## Architecture
-| Component | Model |
-|-----------|-------|
-| **V-LLM** | Qwen2.5-VL-3B-Instruct |
-| **DiT** | OmniGen2Transformer3DModel |
-| **VAE** | FLUX.1-dev AutoencoderKL |
-| **Scheduler** | FlowMatchEulerDiscreteScheduler |
 ## Acknowledgements
@@ -67,4 +80,4 @@ img.save("output_combined.png")
 ## License
-Our code and weights are released under the [CC by-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/deed.en).

 ---
 library_name: diffusers
+license: apache-2.0
+pipeline_tag: image-to-video
 tags:
 - optical-flow prediction
 - motion prediction
 # FOFPred: Language-Driven Future Optical Flow Prediction
+**FOFPred** is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., *"Moving the water bottle from right to left"*), FOFPred generates 4 sequential optical flow frames showing how objects would move to accomplish that action.
+[Paper](https://huggingface.co/papers/2601.10781) | [Project Page](https://fofpred.github.io) | [GitHub](https://github.com/SalesforceAIResearch/FOFPred)
 ## Usage
 ## Architecture
+| Component | Model | Description |
+|-----------|-------|-------------|
+| **V-LLM** | Qwen2.5-VL-3B-Instruct | Multimodal understanding of images and text |
+| **DiT** | OmniGen2Transformer3DModel | Modification of OmniGen2Transformer to generate frame sequences |
+| **VAE** | FLUX.1-dev AutoencoderKL | VAE (AutoencoderKL model) |
+| **Scheduler** | FlowMatchEulerDiscreteScheduler | Efficient flow-matching sampler |
+## Citation
+```bibtex
+@article{ranasinghe2025future,
+  title={Future Optical Flow Prediction Improves Robot Control & Video Generation},
+  author={Ranasinghe, Kanchana and Zhou, Honglu and Fang, Yu and Yang, Luyu and Xue, Le and Xu, Ran and Xiong, Caiming and Savarese, Silvio and Ryoo, Michael S and Niebles, Juan Carlos},
+  journal={arXiv preprint arXiv:2601.10781},
+  year={2025}
+}
+```
 ## Acknowledgements
 ## License
+The code and weights in this repository are released under the [Apache License 2.0](https://github.com/SalesforceAIResearch/FOFPred/blob/main/LICENSE.txt). (Note: Some documentation may refer to CC BY-NC 4.0).