facebook
/

cowtracker

model_hub_mixin

pytorch_model_hub_mixin

video-understanding

Model card Files Files and versions

cowtracker / README.md

zlai's picture

initial commit

1aaae71 about 2 months ago

|

history blame contribute delete

3.22 kB

	---
	tags:
	- model_hub_mixin
	- pytorch_model_hub_mixin
	- point-tracking
	- optical-flow
	- video-understanding
	license: cc-by-nc-4.0
	language:
	- en
	pipeline_tag: video-to-video
	---

	<div align="center">
	<h1>🐮 CoWTracker: Tracking by Warping instead of Correlation</h1>

	<a href="https://github.com/facebookresearch/cowtracker"><img src="https://img.shields.io/badge/GitHub-Repo-blue" alt="GitHub"></a>

	Zihang Lai, Eldar Insafutdinov, Edgar Sucar, Andrea Vedaldi

	[Meta AI Research](https://ai.facebook.com/research/); [University of Oxford, VGG](https://www.robots.ox.ac.uk/~vgg/)

	</div>

	## Overview

	CoWTracker is a state-of-the-art dense point tracker that eschews traditional cost volumes in favor of an iterative warping mechanism. By warping target features to the query frame and refining tracks with a joint spatio-temporal transformer, CoWTracker achieves state-of-the-art performance on TAP-Vid (DAVIS, Kinetics), RoboTAP, and demonstrates strong zero-shot transfer to optical flow benchmarks like Sintel and KITTI.

	### Key Features

	* No Cost Volumes: Replaces memory-heavy correlation volumes with a lightweight warping operation, scaling linearly with spatial resolution.
	* High-Resolution Tracking: Processes features at high resolution (stride 2) to capture fine details and thin structures, unlike the stride 8 used by most prior methods.
	* Unified Architecture: A single model that excels at both long-range point tracking and optical flow estimation.
	* Robustness: Handles occlusions and rapid motion effectively using a video transformer with interleaved spatial and temporal attention.

	## Quick Start

	Please refer to our [GitHub Repo](https://github.com/facebookresearch/cowtracker) for installation and usage instructions.

	### Python API

	```python
	import torch
	from cowtracker import CowTracker

	# Load model
	model = CowTracker.from_checkpoint(
	"./cow_tracker_model.pth",
	device="cuda",
	dtype=torch.float16,
	)

	# Prepare video tensor [T, 3, H, W] in float32, range [0, 255]
	video = ...

	# Run inference
	with torch.no_grad():
	predictions = model(video)

	# Get outputs
	tracks = predictions["track"] # [B, T, H, W, 2] - tracked point coordinates
	vis = predictions["vis"] # [B, T, H, W] - visibility scores
	conf = predictions["conf"] # [B, T, H, W] - confidence scores
	```

	For long videos, use `CowTrackerWindowed`:

	```python
	from cowtracker import CowTrackerWindowed

	model = CowTrackerWindowed.from_checkpoint(
	"./cow_tracker_model.pth",
	device="cuda",
	dtype=torch.float16,
	)
	```

	## Citation

	If you find our repository useful, please consider giving it a star ⭐ and citing our paper in your work:

	```bibtex
	```

	## License

	This model is released under the [CC-BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/) license.

	This release is intended to support the open-source research community and fundamental research. Users are expected to leverage the artifacts for research purposes and make research findings arising from the artifacts publicly available for the benefit of the research community.