Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,39 @@ tags:
|
|
| 5 |
pipeline_tag: image-to-3d
|
| 6 |
---
|
| 7 |
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
pipeline_tag: image-to-3d
|
| 6 |
---
|
| 7 |
|
| 8 |
+
<div align="center">
|
| 9 |
+
<h1>Streaming 4D Visual Geometry Transformer</h1>
|
| 10 |
+
</div>
|
| 11 |
+
|
| 12 |
+
### [Paper](https://arxiv.org/abs/2507.11539) | [Project Page](https://wzzheng.net/StreamVGGT)
|
| 13 |
+
|
| 14 |
+
>Streaming 4D Visual Geometry Transformer
|
| 15 |
+
|
| 16 |
+
>Dong Zhuo<sup>\*</sup>, [Wenzhao Zheng](https://wzzheng.net/)<sup>*</sup>$\dagger$, Jiahe Guo, Yuqi Wu, [Jie Zhou](https://scholar.google.com/citations?user=6a79aPwAAAAJ&hl=en&authuser=1), [Jiwen Lu](http://ivg.au.tsinghua.edu.cn/Jiwen_Lu/)
|
| 17 |
+
|
| 18 |
+
<sup>*</sup> Equal contribution. $\dagger$ Project leader.
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
**StreamVGGT**, a causal transformer architecture for **real-time streaming 4D visual geometry perception** compatiable with LLM-targeted attention mechanism (e.g., [FlashAttention](https://github.com/Dao-AILab/flash-attention)), delivers both fast inference and high-quality 4D reconstruction.
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
## Overview
|
| 25 |
+
|
| 26 |
+
Given a sequence of images, unlike offline models that require reprocessing the entire sequence and reconstructing the entire scene upon receiving each new image, our StreamVGGT employs temporal
|
| 27 |
+
causal attention and leverages cached memory token to support efficient incremental on-the-fly reconstruction, enabling interative and real-time online applitions.
|
| 28 |
+
|
| 29 |
+
## Quick start
|
| 30 |
+
|
| 31 |
+
Please refer to our [Github Repo](https://github.com/wzzheng/StreamVGGT).
|
| 32 |
+
|
| 33 |
+
## Citation
|
| 34 |
+
|
| 35 |
+
If you find this project helpful, please consider citing the following paper:
|
| 36 |
+
```
|
| 37 |
+
@article{streamVGGT,
|
| 38 |
+
title={Streaming 4D Visual Geometry Transformer},
|
| 39 |
+
author={Dong Zhuo and Wenzhao Zheng and Jiahe Guo and Yuqi Wu and Jie Zhou and Jiwen Lu},
|
| 40 |
+
journal={arXiv preprint arXiv:2507.11539},
|
| 41 |
+
year={2025}
|
| 42 |
+
}
|
| 43 |
+
```
|