File size: 2,326 Bytes

acbe10b
 
af5eb5e
acbe10b
 
 
 
 
 
af5eb5e
acbe10b
af5eb5e
 
 
 
 
 
acbe10b
 
 
 
 
 
 
af5eb5e
 
 
 
 
 
 
acbe10b
 
 
af5eb5e
acbe10b
 
 
 
 
 
 
 
af5eb5e

---
license: mit
pipeline_tag: depth-estimation
tags:
- video-depth
- monocular-geometry
- streaming
---

# DyFN: Stabilizing Streaming Video Geometry via Dynamic Feature Normalization

This repository contains the pretrained checkpoint for **DyFN**, a model designed for consistent 3D geometry estimation from streaming RGB input.

[**Paper**](https://huggingface.co/papers/2605.25308) | [**Project Page**](https://shawlyu.github.io/DyFN) | [**Code**](https://github.com/shawLyu/Streaming_DyFN)

## Description
Dynamic Feature Normalization (DyFN) is a lightweight, causal recurrent module that dynamically and robustly modulates feature statistics to maintain stable geometry over time. By finetuning only DyFN (a mere 2% additional parameters) on pretrained monocular geometry models, it effectively eliminates temporal artifacts such as disjointed layering and positional jitter without compromising single-image accuracy.

- **File:** `DyFN.pt`
- **Parameters:** ~320M
- **Base:** MoGe-ViT-L with ConvGRU temporal stabilizer

## Usage

To use this model, you can install the package via:
```bash
pip install git+https://github.com/shawLyu/Streaming_DyFN.git
```

Then, load the model with the following snippet:

```python
from moge.model.v1 import MoGeModel

# Load from Hugging Face Hub
model = MoGeModel.from_pretrained("shawlyu/DyFN")
```

Or pass a local path:

```python
model = MoGeModel.from_pretrained("./pretrained/DyFN.pt")
```

## Citation

If you find this project useful in your research, please cite:

```bibtex
@inproceedings{lyu2026streamingdepth,
  title={Stabilizing Streaming Video Geometry via Dynamic Feature Normalization},
  author={Lyu, Xiaoyang and Liu, Muxin and Wu, Xiaoshan and Wang, Ruicheng and Huang, Yi-Hua and Sun, Yang-Tian and Shi, Shaoshuai and Qi, Xiaojuan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

@inproceedings{wang2025moge,
  title={Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision},
  author={Wang, Ruicheng and Xu, Sicheng and Dai, Cassie and Xiang, Jianfeng and Deng, Yu and Tong, Xin and Yang, Jiaolong},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={5261--5271},
  year={2025}
}
```