Improve model card: Add pipeline tag, paper, links, abstract, and sample usage
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,4 +1,80 @@
|
|
| 1 |
-
|
| 2 |
---
|
| 3 |
license: apache-2.0
|
|
|
|
| 4 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
pipeline_tag: depth-estimation
|
| 4 |
---
|
| 5 |
+
|
| 6 |
+
# BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
|
| 7 |
+
|
| 8 |
+
This model was presented in the paper [BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation](https://huggingface.co/papers/2509.25077).
|
| 9 |
+
|
| 10 |
+
- 📄 [Paper](https://huggingface.co/papers/2509.25077)
|
| 11 |
+
- 💻 [Code](https://github.com/lnbxldn/BRIDGE)
|
| 12 |
+
- 🌐 [Project Page](https://dingning-liu.github.io/bridge.github.io/)
|
| 13 |
+
- 🚀 [Demo](https://huggingface.co/spaces/Dingning/Bridge)
|
| 14 |
+
|
| 15 |
+

|
| 16 |
+
|
| 17 |
+
## 📰 News
|
| 18 |
+
|
| 19 |
+
- **[2025-09-30] 🚀🚀🚀 We published BRIDGE on [arXiv](https://arxiv.org/abs/2509.25077) and demos on huggingface! Try our [DEMO](https://huggingface.co/spaces/Dingning/Bridge).**
|
| 20 |
+
- [2025-09-30] **🎉🎉🎉** We released the model [checkpoint](https://huggingface.co/Dingning/BRIDGE) on huggingface.
|
| 21 |
+
|
| 22 |
+
## Abstract
|
| 23 |
+
|
| 24 |
+
Monocular Depth Estimation (MDE) is a foundational task for computer vision. Traditional methods are limited by data scarcity and quality, hindering their robustness. To overcome this, we propose BRIDGE, an RL-optimized depth-to-image (D2I) generation framework that synthesizes over 20M realistic and geometrically accurate RGB images, each intrinsically paired with its ground truth depth, from diverse source depth maps. Then we train our depth estimation model on this dataset, employing a hybrid supervision strategy that integrates teacher pseudo-labels with ground truth depth for comprehensive and robust training. This innovative data generation and training paradigm enables BRIDGE to achieve breakthroughs in scale and domain diversity, consistently outperforming existing state-of-the-art approaches quantitatively and in complex scene detail capture, thereby fostering general and robust depth features. Code and models are available at this https URL .
|
| 25 |
+
|
| 26 |
+
## 🛫Overview
|
| 27 |
+
|
| 28 |
+
1. We present BRIDGE, an RL-optimized, large-scale Depth-to-Image (D2I) data engine. It generates massive, high-quality RGB-D data to address critical Monocular Depth Estimation (MDE) training challenges and foster robust real-world performance.
|
| 29 |
+
|
| 30 |
+
Our main contributions are summarized as follows:
|
| 31 |
+
|
| 32 |
+
- **An efficient RL-driven D2I data engine:** BRIDGE efficiently generates over 20 million diverse, high-quality RGB-D data from synthetic depth, alleviating data scarcity and quality issues.
|
| 33 |
+
- **A novel hybrid depth supervision strategy:** We introduce a hybrid training strategy combining generated RGB with high-precision ground truth and teacher pseudo-labels, enhancing geometric knowledge learning.
|
| 34 |
+
- **Superior performance and high training efficiency:** BRIDGE achieves SOTA MDE performance across benchmarks with significantly less data (20M vs. 62M), demonstrating excellent detail capture and robustness.
|
| 35 |
+
|
| 36 |
+
## 📀Pre-trained Models
|
| 37 |
+
Download the checkpoint from [huggingface](https://huggingface.co/Dingning/BRIDGE/resolve/main/bridge.pth) and put it under the `checkpoints` directory.
|
| 38 |
+
|
| 39 |
+
## 🏋️Preparation
|
| 40 |
+
|
| 41 |
+
```bash
|
| 42 |
+
git clone https://github.com/lnbxldn/Bridge.git
|
| 43 |
+
cd Bridge
|
| 44 |
+
pip install -r requirements.txt
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
## 💻Inference
|
| 48 |
+
|
| 49 |
+
```python
|
| 50 |
+
import cv2
|
| 51 |
+
import torch
|
| 52 |
+
import numpy as np
|
| 53 |
+
from bridge.dpt import Bridge
|
| 54 |
+
|
| 55 |
+
# Define DEVICE based on availability (e.g., 'cuda' if GPU is available, else 'cpu')
|
| 56 |
+
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
|
| 57 |
+
|
| 58 |
+
model = Bridge()
|
| 59 |
+
model.load_state_dict(torch.load(f'checkpoints/bridge.pth', map_location='cpu'))
|
| 60 |
+
model = model.to(DEVICE).eval()
|
| 61 |
+
|
| 62 |
+
raw_img = cv2.imread('your/image/path')
|
| 63 |
+
depth = model.infer_image(raw_img)
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
+
## 🔍Citation
|
| 67 |
+
|
| 68 |
+
If you find this project useful, please citing:
|
| 69 |
+
|
| 70 |
+
```bibtex
|
| 71 |
+
@misc{Liu2025BRIDGE,
|
| 72 |
+
title={BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation},
|
| 73 |
+
author={Liu, Dingning and Guo, Haoyu and Zhou, Jingyi and He, Tong},
|
| 74 |
+
year={2025},
|
| 75 |
+
eprint={2509.25077},
|
| 76 |
+
archivePrefix={arXiv},
|
| 77 |
+
primaryClass={cs.CV},
|
| 78 |
+
url={https://arxiv.org/abs/2509.25077},
|
| 79 |
+
}
|
| 80 |
+
```
|