Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,119 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
inference: false
|
| 4 |
+
datasets:
|
| 5 |
+
- autoflow
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
# Perceiver IO optical flow model
|
| 9 |
+
|
| 10 |
+
This model is a Perceiver IO optical flow model pretrained on [AutoFlow](https://autoflow-google.github.io/).
|
| 11 |
+
It is weight-equivalent to the [deepmind/optical-flow-perceiver](https://huggingface.co/deepmind/optical-flow-perceiver)
|
| 12 |
+
model but based on implementation classes of the [perceiver-io](https://github.com/krasserm/perceiver-io) library. It
|
| 13 |
+
can be created from the `deepmind/optical-flow-perceiver` model with a library-specific [conversion utility](#model-conversion).
|
| 14 |
+
Both models generate equal output for the same input.
|
| 15 |
+
|
| 16 |
+
Content of the `deepmind/optical-flow-perceiver` [model card](https://huggingface.co/deepmind/optical-flow-perceiver)
|
| 17 |
+
also applies to this model except [usage examples](#usage-examples). Refer to the linked card for further model and
|
| 18 |
+
training details.
|
| 19 |
+
|
| 20 |
+
## Model description
|
| 21 |
+
|
| 22 |
+
The model is specified in Appendix H (Table 16) of the [Perceiver IO paper](https://arxiv.org/abs/2107.14795).
|
| 23 |
+
|
| 24 |
+
## Intended use and limitations
|
| 25 |
+
|
| 26 |
+
The model can be used to predict the optical flow between a pair of images.
|
| 27 |
+
|
| 28 |
+
## Usage examples
|
| 29 |
+
|
| 30 |
+
To use this model you first need to [install](https://github.com/krasserm/perceiver-io/blob/main/README.md#installation)
|
| 31 |
+
the `perceiver-io` library with extension `vision`.
|
| 32 |
+
|
| 33 |
+
```shell
|
| 34 |
+
pip install perceiver-io[vision]
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
Then the model can be used with PyTorch.
|
| 38 |
+
|
| 39 |
+
### Image pair
|
| 40 |
+
|
| 41 |
+
The following example uses this image pair as input
|
| 42 |
+
|
| 43 |
+
<img src="https://martin-krasser.com/perceiver/flow/frame_0047.png" alt="image-1" width="500"/>
|
| 44 |
+
<img src="https://martin-krasser.com/perceiver/flow/frame_0048.png" alt="image-2" width="500"/>
|
| 45 |
+
|
| 46 |
+
and renders their optical flow as HSV representation (`render=True`):
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
import requests
|
| 50 |
+
from PIL import Image
|
| 51 |
+
from transformers import pipeline
|
| 52 |
+
from perceiver.model.vision import optical_flow # register optical flow pipeline
|
| 53 |
+
|
| 54 |
+
frame_1 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0047.png", stream=True).raw)
|
| 55 |
+
frame_2 = Image.open(requests.get("https://martin-krasser.com/perceiver/flow/frame_0048.png", stream=True).raw)
|
| 56 |
+
|
| 57 |
+
optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0")
|
| 58 |
+
rendered_optical_flow = optical_flow_pipeline((frame_1, frame_2), render=True)
|
| 59 |
+
|
| 60 |
+
Image.fromarray(rendered_optical_flow).save("optical_flow.png")
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
The [rendered optical flow](https://martin-krasser.com/perceiver/flow/optical_flow.png) is
|
| 64 |
+
|
| 65 |
+
<img src="https://martin-krasser.com/perceiver/flow/optical_flow.png" alt="image-2" width="500"/>
|
| 66 |
+
|
| 67 |
+
### Video
|
| 68 |
+
|
| 69 |
+
To compute the optical flow of an entire video, the `optical-flow` pipeline can be used in combination with functions
|
| 70 |
+
from `video_utils`. The following code samples all frames from a [video snippet](https://martin-krasser.com/perceiver/flow/sintel_clip_cave_dragon_fight.mp4)
|
| 71 |
+
taken from the [Sintel animated short movie](https://durian.blender.org/), computes the optical flow per consecutive
|
| 72 |
+
frame pair and writes the rendered results back to an output video file.
|
| 73 |
+
|
| 74 |
+
```python
|
| 75 |
+
from transformers import pipeline
|
| 76 |
+
from perceiver.data.vision import video_utils
|
| 77 |
+
from perceiver.model.vision import optical_flow # register optical flow pipeline
|
| 78 |
+
|
| 79 |
+
optical_flow_pipeline = pipeline("optical-flow", model="krasserm/perceiver-io-optical-flow", device="cuda:0")
|
| 80 |
+
|
| 81 |
+
# sample consecutive video frame pairs
|
| 82 |
+
frame_pairs = video_utils.read_video_frame_pairs("sintel_clip_cave_dragon_fight.mp4")
|
| 83 |
+
|
| 84 |
+
# create and render optical flow for all frame pairs
|
| 85 |
+
optical_flows = optical_flow_pipeline(frame_pairs, render=True, device="cuda:0")
|
| 86 |
+
|
| 87 |
+
# create video with rendered optical flows
|
| 88 |
+
video_utils.write_video("sintel_clip_cave_dragon_fight_output.mp4", optical_flows, fps=24)
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
A side-by-side comparison of the input and output video is:
|
| 92 |
+
|
| 93 |
+

|
| 94 |
+
|
| 95 |
+
## Model conversion
|
| 96 |
+
|
| 97 |
+
The `krasserm/perceiver-io-optical-flow` model has been created from the source `deepmind/optical-flow-perceiver` model
|
| 98 |
+
with:
|
| 99 |
+
|
| 100 |
+
```python
|
| 101 |
+
from perceiver.model.vision.optical_flow import convert_model
|
| 102 |
+
|
| 103 |
+
convert_model(
|
| 104 |
+
save_dir="krasserm/perceiver-io-optical-flow",
|
| 105 |
+
source_repo_id="deepmind/optical-flow-perceiver",
|
| 106 |
+
push_to_hub=True,
|
| 107 |
+
)
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
## Citation
|
| 111 |
+
|
| 112 |
+
```bibtex
|
| 113 |
+
@article{jaegle2021perceiver,
|
| 114 |
+
title={Perceiver IO: A General Architecture for Structured Inputs \& Outputs},
|
| 115 |
+
author={Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and others},
|
| 116 |
+
journal={arXiv preprint arXiv:2107.14795},
|
| 117 |
+
year={2021}
|
| 118 |
+
}
|
| 119 |
+
```
|