File size: 4,419 Bytes
b4c5c33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
language: en
license: apache-2.0
tags:
- optical-flow
- point-tracking
- computer-vision
- zero-shot
- vit
library_name: megaflow
pipeline_tag: image-to-image
---

# MegaFlow: Zero-Shot Large Displacement Optical Flow

**[Dingxi Zhang](https://kristen-z.github.io/)** · **[Fangjinhua Wang](https://fangjinhuawang.github.io/)** · **[Marc Pollefeys](https://people.inf.ethz.ch/marc.pollefeys/)** · **[Haofei Xu](https://haofeixu.github.io/)**

*ETH Zurich · Microsoft · University of Tübingen, Tübingen AI Center*

[![Project Page](https://img.shields.io/badge/Project-Page-blue?style=flat&logo=Google%20chrome&logoColor=white)](https://kristen-z.github.io/projects/megaflow/)
[![arXiv](https://img.shields.io/badge/arXiv-Paper-b31b1b.svg?style=flat&logo=arxiv&logoColor=white)](https://arxiv.org/abs/)
[![GitHub](https://img.shields.io/badge/GitHub-Code-black?style=flat&logo=github)](https://github.com/cvg/megaflow)
[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cvg/megaflow/blob/main/demo_colab.ipynb)

---

**MegaFlow** is a simple, powerful, and unified model for **zero-shot large displacement optical flow** and **point tracking**.

MegaFlow leverages pre-trained Vision Transformer features to naturally capture extreme motion, followed by lightweight iterative refinement for sub-pixel accuracy. It achieves **state-of-the-art zero-shot performance** across major optical flow benchmarks (Sintel, KITTI, Spring) and delivers highly competitive zero-shot generalizability on long-range point tracking benchmarks.

## Highlights

- 🏆 State-of-the-art zero-shot performance on Sintel, KITTI, and Spring
- 🎯 Designed for large displacement optical flow
- 📹 Flexible temporal window — processes any number of frames at once
- 🔄 Single backbone for both optical flow and long-range point tracking

## Available Models

| Model ID | Task | Description |
|---|---|---|
| `megaflow-flow` | Optical flow | Full training curriculum (default) |
| `megaflow-chairs-things` | Optical flow | Trained on FlyingThings + FlyingChairs only |
| `megaflow-track` | Point tracking | Fine-tuned on Kubric |

## Quick Start

### Installation

```bash
pip install git+https://github.com/cvg/megaflow.git

```
Requirements: Python ≥ 3.12, PyTorch ≥ 2.7, CUDA recommended.

### Optical Flow
```python
import torch
from megaflow import MegaFlow

device = "cuda" if torch.cuda.is_available() else "cpu"

# video: float32 tensor [1, T, 3, H, W], pixel values in [0, 255]
video = ...

model = MegaFlow.from_pretrained("megaflow-flow").eval().to(device)

with torch.inference_mode():
    with torch.autocast(device_type=device, dtype=torch.bfloat16):
        # Returns flow for consecutive pairs: (0→1, 1→2, ...)
        # Shape: [1, T-1, 2, H, W]
        flow = model(video, num_reg_refine=8)["flow_preds"][-1]
```

### Point Tracking
```python
import torch
from megaflow import MegaFlow
from megaflow.utils.basic import gridcloud2d

device = "cuda" if torch.cuda.is_available() else "cpu"

# video: float32 tensor [1, T, 3, H, W], pixel values in [0, 255]
video = ...

model = MegaFlow.from_pretrained("megaflow-track").eval().to(device)

with torch.inference_mode():
    with torch.autocast(device_type=device, dtype=torch.bfloat16):
        # Returns dense offsets from frame 0 to each frame t
        flows_e = model.forward_track(video, num_reg_refine=8)["flow_final"]

# Convert offsets to absolute coordinates
grid_xy = gridcloud2d(1, H, W, norm=False, device=device).float()
grid_xy = grid_xy.permute(0, 2, 1).reshape(1, 1, 2, H, W)
tracks = flows_e + grid_xy  # [1, T, 2, H, W]
```
## Demo Scripts
```bash
# Clone the repo and run demos
git clone https://github.com/cvg/megaflow.git
cd megaflow

# Optical flow on a video
python demo_flow.py --input assets/longboard.mp4 --output output/longboard_flow.mp4

# Dense point tracking
python demo_track.py --input assets/apple.mp4 --grid_size 8

# Gradio web UI
python demo_gradio.py
```
Or try the [Colab notebook](https://colab.research.google.com/github/cvg/megaflow/blob/main/demo_colab.ipynb) directly in the browser.

## Citation
```
@article{zhang2026megaflow,
  title   = {MegaFlow: Zero-Shot Large Displacement Optical Flow},
  author  = {Zhang, Dingxi and Wang, Fangjinhua and Pollefeys, Marc and Xu, Haofei},
  journal = {arXiv preprint arXiv:2603.25739},
  year    = {2026}
}
```