Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,123 @@
|
|
| 1 |
-
---
|
| 2 |
-
|
| 3 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: apache-2.0
|
| 4 |
+
tags:
|
| 5 |
+
- optical-flow
|
| 6 |
+
- point-tracking
|
| 7 |
+
- computer-vision
|
| 8 |
+
- zero-shot
|
| 9 |
+
- vit
|
| 10 |
+
library_name: megaflow
|
| 11 |
+
pipeline_tag: image-to-image
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
# MegaFlow: Zero-Shot Large Displacement Optical Flow
|
| 15 |
+
|
| 16 |
+
**[Dingxi Zhang](https://kristen-z.github.io/)** · **[Fangjinhua Wang](https://fangjinhuawang.github.io/)** · **[Marc Pollefeys](https://people.inf.ethz.ch/marc.pollefeys/)** · **[Haofei Xu](https://haofeixu.github.io/)**
|
| 17 |
+
|
| 18 |
+
*ETH Zurich · Microsoft · University of Tübingen, Tübingen AI Center*
|
| 19 |
+
|
| 20 |
+
[](https://kristen-z.github.io/projects/megaflow/)
|
| 21 |
+
[](https://arxiv.org/abs/)
|
| 22 |
+
[](https://github.com/cvg/megaflow)
|
| 23 |
+
[](https://colab.research.google.com/github/cvg/megaflow/blob/main/demo_colab.ipynb)
|
| 24 |
+
|
| 25 |
+
---
|
| 26 |
+
|
| 27 |
+
**MegaFlow** is a simple, powerful, and unified model for **zero-shot large displacement optical flow** and **point tracking**.
|
| 28 |
+
|
| 29 |
+
MegaFlow leverages pre-trained Vision Transformer features to naturally capture extreme motion, followed by lightweight iterative refinement for sub-pixel accuracy. It achieves **state-of-the-art zero-shot performance** across major optical flow benchmarks (Sintel, KITTI, Spring) and delivers highly competitive zero-shot generalizability on long-range point tracking benchmarks.
|
| 30 |
+
|
| 31 |
+
## Highlights
|
| 32 |
+
|
| 33 |
+
- 🏆 State-of-the-art zero-shot performance on Sintel, KITTI, and Spring
|
| 34 |
+
- 🎯 Designed for large displacement optical flow
|
| 35 |
+
- 📹 Flexible temporal window — processes any number of frames at once
|
| 36 |
+
- 🔄 Single backbone for both optical flow and long-range point tracking
|
| 37 |
+
|
| 38 |
+
## Available Models
|
| 39 |
+
|
| 40 |
+
| Model ID | Task | Description |
|
| 41 |
+
|---|---|---|
|
| 42 |
+
| `megaflow-flow` | Optical flow | Full training curriculum (default) |
|
| 43 |
+
| `megaflow-chairs-things` | Optical flow | Trained on FlyingThings + FlyingChairs only |
|
| 44 |
+
| `megaflow-track` | Point tracking | Fine-tuned on Kubric |
|
| 45 |
+
|
| 46 |
+
## Quick Start
|
| 47 |
+
|
| 48 |
+
### Installation
|
| 49 |
+
|
| 50 |
+
```bash
|
| 51 |
+
pip install git+https://github.com/cvg/megaflow.git
|
| 52 |
+
|
| 53 |
+
```
|
| 54 |
+
Requirements: Python ≥ 3.12, PyTorch ≥ 2.7, CUDA recommended.
|
| 55 |
+
|
| 56 |
+
### Optical Flow
|
| 57 |
+
```python
|
| 58 |
+
import torch
|
| 59 |
+
from megaflow import MegaFlow
|
| 60 |
+
|
| 61 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 62 |
+
|
| 63 |
+
# video: float32 tensor [1, T, 3, H, W], pixel values in [0, 255]
|
| 64 |
+
video = ...
|
| 65 |
+
|
| 66 |
+
model = MegaFlow.from_pretrained("megaflow-flow").eval().to(device)
|
| 67 |
+
|
| 68 |
+
with torch.inference_mode():
|
| 69 |
+
with torch.autocast(device_type=device, dtype=torch.bfloat16):
|
| 70 |
+
# Returns flow for consecutive pairs: (0→1, 1→2, ...)
|
| 71 |
+
# Shape: [1, T-1, 2, H, W]
|
| 72 |
+
flow = model(video, num_reg_refine=8)["flow_preds"][-1]
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
### Point Tracking
|
| 76 |
+
```python
|
| 77 |
+
import torch
|
| 78 |
+
from megaflow import MegaFlow
|
| 79 |
+
from megaflow.utils.basic import gridcloud2d
|
| 80 |
+
|
| 81 |
+
device = "cuda" if torch.cuda.is_available() else "cpu"
|
| 82 |
+
|
| 83 |
+
# video: float32 tensor [1, T, 3, H, W], pixel values in [0, 255]
|
| 84 |
+
video = ...
|
| 85 |
+
|
| 86 |
+
model = MegaFlow.from_pretrained("megaflow-track").eval().to(device)
|
| 87 |
+
|
| 88 |
+
with torch.inference_mode():
|
| 89 |
+
with torch.autocast(device_type=device, dtype=torch.bfloat16):
|
| 90 |
+
# Returns dense offsets from frame 0 to each frame t
|
| 91 |
+
flows_e = model.forward_track(video, num_reg_refine=8)["flow_final"]
|
| 92 |
+
|
| 93 |
+
# Convert offsets to absolute coordinates
|
| 94 |
+
grid_xy = gridcloud2d(1, H, W, norm=False, device=device).float()
|
| 95 |
+
grid_xy = grid_xy.permute(0, 2, 1).reshape(1, 1, 2, H, W)
|
| 96 |
+
tracks = flows_e + grid_xy # [1, T, 2, H, W]
|
| 97 |
+
```
|
| 98 |
+
## Demo Scripts
|
| 99 |
+
```bash
|
| 100 |
+
# Clone the repo and run demos
|
| 101 |
+
git clone https://github.com/cvg/megaflow.git
|
| 102 |
+
cd megaflow
|
| 103 |
+
|
| 104 |
+
# Optical flow on a video
|
| 105 |
+
python demo_flow.py --input assets/longboard.mp4 --output output/longboard_flow.mp4
|
| 106 |
+
|
| 107 |
+
# Dense point tracking
|
| 108 |
+
python demo_track.py --input assets/apple.mp4 --grid_size 8
|
| 109 |
+
|
| 110 |
+
# Gradio web UI
|
| 111 |
+
python demo_gradio.py
|
| 112 |
+
```
|
| 113 |
+
Or try the [Colab notebook](https://colab.research.google.com/github/cvg/megaflow/blob/main/demo_colab.ipynb) directly in the browser.
|
| 114 |
+
|
| 115 |
+
## Citation
|
| 116 |
+
```
|
| 117 |
+
@article{zhang2026megaflow,
|
| 118 |
+
title = {MegaFlow: Zero-Shot Large Displacement Optical Flow},
|
| 119 |
+
author = {Zhang, Dingxi and Wang, Fangjinhua and Pollefeys, Marc and Xu, Haofei},
|
| 120 |
+
journal = {arXiv preprint arXiv:2603.25739},
|
| 121 |
+
year = {2026}
|
| 122 |
+
}
|
| 123 |
+
```
|