Improve model card: add paper link, project page, and update metadata

#11
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +23 -10
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
- license: apache-2.0
3
  library_name: diffusers
4
- pipeline_tag: image-to-image
 
5
  tags:
6
  - optical-flow prediction
7
  - motion prediction
@@ -10,7 +10,9 @@ tags:
10
 
11
  # FOFPred: Language-Driven Future Optical Flow Prediction
12
 
13
- **FOFPred** is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., *"Moving the water bottle from right to left"*), FOFPred generates 4 sequential optical flow frames showing how objects would move.
 
 
14
 
15
  ## Usage
16
 
@@ -52,12 +54,23 @@ img.save("output_combined.png")
52
 
53
  ## Architecture
54
 
55
- | Component | Model |
56
- |-----------|-------|
57
- | **V-LLM** | Qwen2.5-VL-3B-Instruct |
58
- | **DiT** | OmniGen2Transformer3DModel |
59
- | **VAE** | FLUX.1-dev AutoencoderKL |
60
- | **Scheduler** | FlowMatchEulerDiscreteScheduler |
 
 
 
 
 
 
 
 
 
 
 
61
 
62
  ## Acknowledgements
63
 
@@ -67,4 +80,4 @@ img.save("output_combined.png")
67
 
68
  ## License
69
 
70
- Our code and weights are released under the [CC by-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/deed.en).
 
1
  ---
 
2
  library_name: diffusers
3
+ license: apache-2.0
4
+ pipeline_tag: image-to-video
5
  tags:
6
  - optical-flow prediction
7
  - motion prediction
 
10
 
11
  # FOFPred: Language-Driven Future Optical Flow Prediction
12
 
13
+ **FOFPred** is a diffusion-based model that predicts future optical flow from a single image guided by natural language instructions. Given an input image and a text prompt describing a desired action (e.g., *"Moving the water bottle from right to left"*), FOFPred generates 4 sequential optical flow frames showing how objects would move to accomplish that action.
14
+
15
+ [Paper](https://huggingface.co/papers/2601.10781) | [Project Page](https://fofpred.github.io) | [GitHub](https://github.com/SalesforceAIResearch/FOFPred)
16
 
17
  ## Usage
18
 
 
54
 
55
  ## Architecture
56
 
57
+ | Component | Model | Description |
58
+ |-----------|-------|-------------|
59
+ | **V-LLM** | Qwen2.5-VL-3B-Instruct | Multimodal understanding of images and text |
60
+ | **DiT** | OmniGen2Transformer3DModel | Modification of OmniGen2Transformer to generate frame sequences |
61
+ | **VAE** | FLUX.1-dev AutoencoderKL | VAE (AutoencoderKL model) |
62
+ | **Scheduler** | FlowMatchEulerDiscreteScheduler | Efficient flow-matching sampler |
63
+
64
+ ## Citation
65
+
66
+ ```bibtex
67
+ @article{ranasinghe2025future,
68
+ title={Future Optical Flow Prediction Improves Robot Control & Video Generation},
69
+ author={Ranasinghe, Kanchana and Zhou, Honglu and Fang, Yu and Yang, Luyu and Xue, Le and Xu, Ran and Xiong, Caiming and Savarese, Silvio and Ryoo, Michael S and Niebles, Juan Carlos},
70
+ journal={arXiv preprint arXiv:2601.10781},
71
+ year={2025}
72
+ }
73
+ ```
74
 
75
  ## Acknowledgements
76
 
 
80
 
81
  ## License
82
 
83
+ The code and weights in this repository are released under the [Apache License 2.0](https://github.com/SalesforceAIResearch/FOFPred/blob/main/LICENSE.txt). (Note: Some documentation may refer to CC BY-NC 4.0).