zlai commited on
Commit
1aaae71
·
1 Parent(s): e3f6d71

initial commit

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md CHANGED
@@ -1,3 +1,89 @@
1
  ---
 
 
 
 
 
 
2
  license: cc-by-nc-4.0
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - model_hub_mixin
4
+ - pytorch_model_hub_mixin
5
+ - point-tracking
6
+ - optical-flow
7
+ - video-understanding
8
  license: cc-by-nc-4.0
9
+ language:
10
+ - en
11
+ pipeline_tag: video-to-video
12
  ---
13
+
14
+ <div align="center">
15
+ <h1>🐮 CoWTracker: Tracking by Warping instead of Correlation</h1>
16
+
17
+ <a href="https://github.com/facebookresearch/cowtracker"><img src="https://img.shields.io/badge/GitHub-Repo-blue" alt="GitHub"></a>
18
+
19
+ Zihang Lai, Eldar Insafutdinov, Edgar Sucar, Andrea Vedaldi
20
+
21
+ **[Meta AI Research](https://ai.facebook.com/research/)**; **[University of Oxford, VGG](https://www.robots.ox.ac.uk/~vgg/)**
22
+
23
+ </div>
24
+
25
+ ## Overview
26
+
27
+ **CoWTracker** is a state-of-the-art dense point tracker that eschews traditional cost volumes in favor of an iterative warping mechanism. By warping target features to the query frame and refining tracks with a joint spatio-temporal transformer, CoWTracker achieves state-of-the-art performance on **TAP-Vid** (DAVIS, Kinetics), **RoboTAP**, and demonstrates strong zero-shot transfer to optical flow benchmarks like **Sintel** and **KITTI**.
28
+
29
+ ### Key Features
30
+
31
+ * **No Cost Volumes:** Replaces memory-heavy correlation volumes with a lightweight warping operation, scaling linearly with spatial resolution.
32
+ * **High-Resolution Tracking:** Processes features at high resolution (stride 2) to capture fine details and thin structures, unlike the stride 8 used by most prior methods.
33
+ * **Unified Architecture:** A single model that excels at both long-range point tracking and optical flow estimation.
34
+ * **Robustness:** Handles occlusions and rapid motion effectively using a video transformer with interleaved spatial and temporal attention.
35
+
36
+ ## Quick Start
37
+
38
+ Please refer to our [GitHub Repo](https://github.com/facebookresearch/cowtracker) for installation and usage instructions.
39
+
40
+ ### Python API
41
+
42
+ ```python
43
+ import torch
44
+ from cowtracker import CowTracker
45
+
46
+ # Load model
47
+ model = CowTracker.from_checkpoint(
48
+ "./cow_tracker_model.pth",
49
+ device="cuda",
50
+ dtype=torch.float16,
51
+ )
52
+
53
+ # Prepare video tensor [T, 3, H, W] in float32, range [0, 255]
54
+ video = ...
55
+
56
+ # Run inference
57
+ with torch.no_grad():
58
+ predictions = model(video)
59
+
60
+ # Get outputs
61
+ tracks = predictions["track"] # [B, T, H, W, 2] - tracked point coordinates
62
+ vis = predictions["vis"] # [B, T, H, W] - visibility scores
63
+ conf = predictions["conf"] # [B, T, H, W] - confidence scores
64
+ ```
65
+
66
+ For long videos, use `CowTrackerWindowed`:
67
+
68
+ ```python
69
+ from cowtracker import CowTrackerWindowed
70
+
71
+ model = CowTrackerWindowed.from_checkpoint(
72
+ "./cow_tracker_model.pth",
73
+ device="cuda",
74
+ dtype=torch.float16,
75
+ )
76
+ ```
77
+
78
+ ## Citation
79
+
80
+ If you find our repository useful, please consider giving it a star ⭐ and citing our paper in your work:
81
+
82
+ ```bibtex
83
+ ```
84
+
85
+ ## License
86
+
87
+ This model is released under the [CC-BY-NC-4.0](https://creativecommons.org/licenses/by-nc/4.0/) license.
88
+
89
+ This release is intended to support the open-source research community and fundamental research. Users are expected to leverage the artifacts for research purposes and make research findings arising from the artifacts publicly available for the benefit of the research community.