krasserm commited on
Commit
1b5f14d
·
1 Parent(s): 3f5c05f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +119 -0
README.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ inference: false
4
+ datasets:
5
+ - autoflow
6
+ ---
7
+
8
+ # Perceiver IO optical flow model
9
+
10
+ This model is a Perceiver IO optical flow model pretrained on [AutoFlow](https://autoflow-google.github.io/).
11
+ It is weight-equivalent to the [deepmind/optical-flow-perceiver](https://huggingface.co/deepmind/optical-flow-perceiver)
12
+ model but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It
13
+ can be created from the `deepmind/optical-flow-perceiver` model with a library-specific [conversion utility](#model-conversion).
14
+ Both models generate equal output for the same input.
15
+
16
+ Content of the `deepmind/optical-flow-perceiver` [model card](https://huggingface.co/deepmind/optical-flow-perceiver)
17
+ also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and
18
+ training details.
19
+
20
+ ## Model description
21
+
22
+ The model is specified in Appendix H (Table 16) of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795).
23
+
24
+ ## Intended use and limitations
25
+
26
+ The model can be used to predict the optical flow between a pair of images.
27
+
28
+ ## Usage examples
29
+
30
+ To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation)
31
+ the `perceiver-io` library with extension `vision`.
32
+
33
+ ```shell
34
+ pip install perceiver-io[vision]
35
+ ```
36
+
37
+ Then the model can be used with PyTorch.
38
+
39
+ ### Image pair
40
+
41
+ The following example uses this image pair as input
42
+
43
+ <img src="https://martin-krasser.com/perceiver/flow/frame_0047.png" alt="image-1" width="500"/>
44
+ <img src="https://martin-krasser.com/perceiver/flow/frame_0048.png" alt="image-2" width="500"/>
45
+
46
+ and renders their optical flow as HSV representation (`render=True`):
47
+
48
+ ```python
49
+ import requests
50
+ from PIL import Image
51
+ from transformers import pipeline
52
+ from perceiver.model.vision import optical_flow # register optical flow pipeline
53
+
54
+ frame_1 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0047.png", stream=True).raw)
55
+ frame_2 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0048.png", stream=True).raw)
56
+
57
+ optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0")
58
+ rendered_optical_flow = optical_flow_pipeline((frame_1, frame_2), render=True)
59
+
60
+ Image.fromarray(rendered_optical_flow).save("optical_flow.png")
61
+ ```
62
+
63
+ The [rendered optical flow](https://martin-krasser.com/perceiver/flow/optical_flow.png) is
64
+
65
+ <img src="https://martin-krasser.com/perceiver/flow/optical_flow.png" alt="image-2" width="500"/>
66
+
67
+ ### Video
68
+
69
+ To compute the optical flow of an entire video, the `optical-flow` pipeline can be used in combination with functions
70
+ from `video_utils`. The following code samples all frames from a [video snippet](https://martin-krasser.com/perceiver/flow/sintel_clip_cave_dragon_fight.mp4)
71
+ taken from the [Sintel animated short movie](https://durian.blender.org/), computes the optical flow per consecutive
72
+ frame pair and writes the rendered results back to an output video file.
73
+
74
+ ```python
75
+ from transformers import pipeline
76
+ from perceiver.data.vision import video_utils
77
+ from perceiver.model.vision import optical_flow # register optical flow pipeline
78
+
79
+ optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0")
80
+
81
+ # sample consecutive video frame pairs
82
+ frame_pairs = video_utils.read_video_frame_pairs("sintel_clip_cave_dragon_fight.mp4")
83
+
84
+ # create and render optical flow for all frame pairs
85
+ optical_flows = optical_flow_pipeline(frame_pairs, render=True, device="cuda:0")
86
+
87
+ # create video with rendered optical flows
88
+ video_utils.write_video("sintel_clip_cave_dragon_fight_output.mp4", optical_flows, fps=24)
89
+ ```
90
+
91
+ A side-by-side comparison of the input and output video is:
92
+
93
+ ![optical-flow-sbs](https://martin-krasser.com/perceiver/flow/sintel_clip_cave_dragon_fight_side_by_side_horizontal.gif)
94
+
95
+ ## Model conversion
96
+
97
+ The `krasserm/perceiver-io-optical-flow` model has been created from the source `deepmind/optical-flow-perceiver` model
98
+ with:
99
+
100
+ ```python
101
+ from perceiver.model.vision.optical_flow import convert_model
102
+
103
+ convert_model(
104
+ save_dir="krasserm/perceiver-io-optical-flow",
105
+ source_repo_id="deepmind/optical-flow-perceiver",
106
+ push_to_hub=True,
107
+ )
108
+ ```
109
+
110
+ ## Citation
111
+
112
+ ```bibtex
113
+ @article{jaegle2021perceiver,
114
+ title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
115
+ author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
116
+ journal={arXiv preprint arXiv:2107.14795},
117
+ year={2021}
118
+ }
119
+ ```