MegaFlow / README.md

Update README.md

b4c5c33 verified 3 days ago

4.42 kB

	---
	language: en
	license: apache-2.0
	tags:
	- optical-flow
	- point-tracking
	- computer-vision
	- zero-shot
	- vit
	library_name: megaflow
	pipeline_tag: image-to-image
	---

	# MegaFlow: Zero-Shot Large Displacement Optical Flow

	[Dingxi Zhang](https://kristen-z.github.io/) · [Fangjinhua Wang](https://fangjinhuawang.github.io/) · [Marc Pollefeys](https://people.inf.ethz.ch/marc.pollefeys/) · [Haofei Xu](https://haofeixu.github.io/)

	ETH Zurich · Microsoft · University of Tübingen, Tübingen AI Center

	[![Project Page](https://img.shields.io/badge/Project-Page-blue?style=flat&logo=Google%20chrome&logoColor=white)](https://kristen-z.github.io/projects/megaflow/)
	[![arXiv](https://img.shields.io/badge/arXiv-Paper-b31b1b.svg?style=flat&logo=arxiv&logoColor=white)](https://arxiv.org/abs/)
	[![GitHub](https://img.shields.io/badge/GitHub-Code-black?style=flat&logo=github)](https://github.com/cvg/megaflow)
	[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cvg/megaflow/blob/main/demo_colab.ipynb)

	---

	MegaFlow is a simple, powerful, and unified model for zero-shot large displacement optical flow and point tracking.

	MegaFlow leverages pre-trained Vision Transformer features to naturally capture extreme motion, followed by lightweight iterative refinement for sub-pixel accuracy. It achieves state-of-the-art zero-shot performance across major optical flow benchmarks (Sintel, KITTI, Spring) and delivers highly competitive zero-shot generalizability on long-range point tracking benchmarks.

	## Highlights

	- 🏆 State-of-the-art zero-shot performance on Sintel, KITTI, and Spring
	- 🎯 Designed for large displacement optical flow
	- 📹 Flexible temporal window — processes any number of frames at once
	- 🔄 Single backbone for both optical flow and long-range point tracking

	## Available Models

	\| Model ID \| Task \| Description \|
	\|---\|---\|---\|
	\| `megaflow-flow` \| Optical flow \| Full training curriculum (default) \|
	\| `megaflow-chairs-things` \| Optical flow \| Trained on FlyingThings + FlyingChairs only \|
	\| `megaflow-track` \| Point tracking \| Fine-tuned on Kubric \|

	## Quick Start

	### Installation

	```bash
	pip install git+https://github.com/cvg/megaflow.git

	```
	Requirements: Python ≥ 3.12, PyTorch ≥ 2.7, CUDA recommended.

	### Optical Flow
	```python
	import torch
	from megaflow import MegaFlow

	device = "cuda" if torch.cuda.is_available() else "cpu"

	# video: float32 tensor [1, T, 3, H, W], pixel values in [0, 255]
	video = ...

	model = MegaFlow.from_pretrained("megaflow-flow").eval().to(device)

	with torch.inference_mode():
	with torch.autocast(device_type=device, dtype=torch.bfloat16):
	# Returns flow for consecutive pairs: (0→1, 1→2, ...)
	# Shape: [1, T-1, 2, H, W]
	flow = model(video, num_reg_refine=8)["flow_preds"][-1]
	```

	### Point Tracking
	```python
	import torch
	from megaflow import MegaFlow
	from megaflow.utils.basic import gridcloud2d

	device = "cuda" if torch.cuda.is_available() else "cpu"

	# video: float32 tensor [1, T, 3, H, W], pixel values in [0, 255]
	video = ...

	model = MegaFlow.from_pretrained("megaflow-track").eval().to(device)

	with torch.inference_mode():
	with torch.autocast(device_type=device, dtype=torch.bfloat16):
	# Returns dense offsets from frame 0 to each frame t
	flows_e = model.forward_track(video, num_reg_refine=8)["flow_final"]

	# Convert offsets to absolute coordinates
	grid_xy = gridcloud2d(1, H, W, norm=False, device=device).float()
	grid_xy = grid_xy.permute(0, 2, 1).reshape(1, 1, 2, H, W)
	tracks = flows_e + grid_xy # [1, T, 2, H, W]
	```
	## Demo Scripts
	```bash
	# Clone the repo and run demos
	git clone https://github.com/cvg/megaflow.git
	cd megaflow

	# Optical flow on a video
	python demo_flow.py --input assets/longboard.mp4 --output output/longboard_flow.mp4

	# Dense point tracking
	python demo_track.py --input assets/apple.mp4 --grid_size 8

	# Gradio web UI
	python demo_gradio.py
	```
	Or try the [Colab notebook](https://colab.research.google.com/github/cvg/megaflow/blob/main/demo_colab.ipynb) directly in the browser.

	## Citation
	```
	@article{zhang2026megaflow,
	title = {MegaFlow: Zero-Shot Large Displacement Optical Flow},
	author = {Zhang, Dingxi and Wang, Fangjinhua and Pollefeys, Marc and Xu, Haofei},
	journal = {arXiv preprint arXiv:2603.25739},
	year = {2026}
	}
	```