nielsr HF Staff commited on
Commit
64cbdd6
·
verified ·
1 Parent(s): fadc3a2

Improve model card: add paper link, project page, and update metadata

Browse files

Hi! I'm Niels from the community science team at Hugging Face.

I've opened this PR to improve the model card for **FOFPred**. Specifically:
- Linked the model to the paper: [Future Optical Flow Prediction Improves Robot Control & Video Generation](https://huggingface.co/papers/2601.10781).
- Added a link to the project page and the GitHub repository.
- Updated the `pipeline_tag` to `image-to-video` as the model generates a sequence of flow frames.
- Added a BibTeX citation for the work.

Please review and merge this if it looks good!

Files changed (1) hide show
  1. README.md +23 -10
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
- license: apache-2.0
3
  library_name: diffusers
4
- pipeline_tag: image-to-image
 
5
  tags:
6
  - optical-flow prediction
7
  - motion prediction
@@ -10,7 +10,9 @@ tags:
10
 
11
  # FOFPred: Language-Driven Future Optical Flow Prediction
12
 
13
- **FOFPred** is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., *"Moving the water bottle from right to left"*), FOFPred generates 4 sequential optical flow frames showing how objects would move.
 
 
14
 
15
  ## Usage
16
 
@@ -52,12 +54,23 @@ img.save("output_combined.png")
52
 
53
  ## Architecture
54
 
55
- | Component | Model |
56
- |-----------|-------|
57
- | **V-LLM** | Qwen2.5-VL-3B-Instruct |
58
- | **DiT** | OmniGen2Transformer3DModel |
59
- | **VAE** | FLUX.1-dev AutoencoderKL |
60
- | **Scheduler** | FlowMatchEulerDiscreteScheduler |
 
 
 
 
 
 
 
 
 
 
 
61
 
62
  ## Acknowledgements
63
 
@@ -67,4 +80,4 @@ img.save("output_combined.png")
67
 
68
  ## License
69
 
70
- Our code and weights are released under the [CC by-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/deed.en).
 
1
  ---
 
2
  library_name: diffusers
3
+ license: apache-2.0
4
+ pipeline_tag: image-to-video
5
  tags:
6
  - optical-flow prediction
7
  - motion prediction
 
10
 
11
  # FOFPred: Language-Driven Future Optical Flow Prediction
12
 
13
+ **FOFPred** is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., *"Moving the water bottle from right to left"*), FOFPred generates 4 sequential optical flow frames showing how objects would move to accomplish that action.
14
+
15
+ [Paper](https://huggingface.co/papers/2601.10781) | [Project Page](https://fofpred.github.io) | [GitHub](https://github.com/SalesforceAIResearch/FOFPred)
16
 
17
  ## Usage
18
 
 
54
 
55
  ## Architecture
56
 
57
+ | Component | Model | Description |
58
+ |-----------|-------|-------------|
59
+ | **V-LLM** | Qwen2.5-VL-3B-Instruct | Multimodal understanding of images and text |
60
+ | **DiT** | OmniGen2Transformer3DModel | Modification of OmniGen2Transformer to generate frame sequences |
61
+ | **VAE** | FLUX.1-dev AutoencoderKL | VAE (AutoencoderKL model) |
62
+ | **Scheduler** | FlowMatchEulerDiscreteScheduler | Efficient flow-matching sampler |
63
+
64
+ ## Citation
65
+
66
+ ```bibtex
67
+ @article{ranasinghe2025future,
68
+ title={Future Optical Flow Prediction Improves Robot Control & Video Generation},
69
+ author={Ranasinghe, Kanchana and Zhou, Honglu and Fang, Yu and Yang, Luyu and Xue, Le and Xu, Ran and Xiong, Caiming and Savarese, Silvio and Ryoo, Michael S and Niebles, Juan Carlos},
70
+ journal={arXiv preprint arXiv:2601.10781},
71
+ year={2025}
72
+ }
73
+ ```
74
 
75
  ## Acknowledgements
76
 
 
80
 
81
  ## License
82
 
83
+ The code and weights in this repository are released under the [Apache License 2.0](https://github.com/SalesforceAIResearch/FOFPred/blob/main/LICENSE.txt). (Note: Some documentation may refer to CC BY-NC 4.0).