| --- |
| license: mit |
| pipeline_tag: depth-estimation |
| tags: |
| - video-depth |
| - monocular-geometry |
| - streaming |
| --- |
| |
| # DyFN: Stabilizing Streaming Video Geometry via Dynamic Feature Normalization |
|
|
| This repository contains the pretrained checkpoint for **DyFN**, a model designed for consistent 3D geometry estimation from streaming RGB input. |
|
|
| [**Paper**](https://huggingface.co/papers/2605.25308) | [**Project Page**](https://shawlyu.github.io/DyFN) | [**Code**](https://github.com/shawLyu/Streaming_DyFN) |
|
|
| ## Description |
| Dynamic Feature Normalization (DyFN) is a lightweight, causal recurrent module that dynamically and robustly modulates feature statistics to maintain stable geometry over time. By finetuning only DyFN (a mere 2% additional parameters) on pretrained monocular geometry models, it effectively eliminates temporal artifacts such as disjointed layering and positional jitter without compromising single-image accuracy. |
|
|
| - **File:** `DyFN.pt` |
| - **Parameters:** ~320M |
| - **Base:** MoGe-ViT-L with ConvGRU temporal stabilizer |
|
|
| ## Usage |
|
|
| To use this model, you can install the package via: |
| ```bash |
| pip install git+https://github.com/shawLyu/Streaming_DyFN.git |
| ``` |
|
|
| Then, load the model with the following snippet: |
|
|
| ```python |
| from moge.model.v1 import MoGeModel |
| |
| # Load from Hugging Face Hub |
| model = MoGeModel.from_pretrained("shawlyu/DyFN") |
| ``` |
|
|
| Or pass a local path: |
|
|
| ```python |
| model = MoGeModel.from_pretrained("./pretrained/DyFN.pt") |
| ``` |
|
|
| ## Citation |
|
|
| If you find this project useful in your research, please cite: |
|
|
| ```bibtex |
| @inproceedings{lyu2026streamingdepth, |
| title={Stabilizing Streaming Video Geometry via Dynamic Feature Normalization}, |
| author={Lyu, Xiaoyang and Liu, Muxin and Wu, Xiaoshan and Wang, Ruicheng and Huang, Yi-Hua and Sun, Yang-Tian and Shi, Shaoshuai and Qi, Xiaojuan}, |
| booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, |
| year={2026} |
| } |
| |
| @inproceedings{wang2025moge, |
| title={Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision}, |
| author={Wang, Ruicheng and Xu, Sicheng and Dai, Cassie and Xiang, Jianfeng and Deng, Yu and Tong, Xin and Yang, Jiaolong}, |
| booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference}, |
| pages={5261--5271}, |
| year={2025} |
| } |
| ``` |