Improve model card: Add pipeline tag, paper, links, abstract, and sample usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +77 -1
README.md CHANGED
@@ -1,4 +1,80 @@
1
-
2
  ---
3
  license: apache-2.0
 
4
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: depth-estimation
4
  ---
5
+
6
+ # BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation
7
+
8
+ This model was presented in the paper [BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation](https://huggingface.co/papers/2509.25077).
9
+
10
+ - 📄 [Paper](https://huggingface.co/papers/2509.25077)
11
+ - 💻 [Code](https://github.com/lnbxldn/BRIDGE)
12
+ - 🌐 [Project Page](https://dingning-liu.github.io/bridge.github.io/)
13
+ - 🚀 [Demo](https://huggingface.co/spaces/Dingning/Bridge)
14
+
15
+ ![teaser](https://github.com/lnbxldn/BRIDGE/raw/main/assets/teaser.png)
16
+
17
+ ## 📰 News
18
+
19
+ - **[2025-09-30] 🚀🚀🚀 We published BRIDGE on [arXiv](https://arxiv.org/abs/2509.25077) and demos on huggingface! Try our [DEMO](https://huggingface.co/spaces/Dingning/Bridge).**
20
+ - [2025-09-30] **🎉🎉🎉** We released the model [checkpoint](https://huggingface.co/Dingning/BRIDGE) on huggingface.
21
+
22
+ ## Abstract
23
+
24
+ Monocular Depth Estimation (MDE) is a foundational task for computer vision. Traditional methods are limited by data scarcity and quality, hindering their robustness. To overcome this, we propose BRIDGE, an RL-optimized depth-to-image (D2I) generation framework that synthesizes over 20M realistic and geometrically accurate RGB images, each intrinsically paired with its ground truth depth, from diverse source depth maps. Then we train our depth estimation model on this dataset, employing a hybrid supervision strategy that integrates teacher pseudo-labels with ground truth depth for comprehensive and robust training. This innovative data generation and training paradigm enables BRIDGE to achieve breakthroughs in scale and domain diversity, consistently outperforming existing state-of-the-art approaches quantitatively and in complex scene detail capture, thereby fostering general and robust depth features. Code and models are available at this https URL .
25
+
26
+ ## 🛫Overview
27
+
28
+ 1. We present BRIDGE, an RL-optimized, large-scale Depth-to-Image (D2I) data engine. It generates massive, high-quality RGB-D data to address critical Monocular Depth Estimation (MDE) training challenges and foster robust real-world performance.
29
+
30
+ Our main contributions are summarized as follows:
31
+
32
+ - **An efficient RL-driven D2I data engine:** BRIDGE efficiently generates over 20 million diverse, high-quality RGB-D data from synthetic depth, alleviating data scarcity and quality issues.
33
+ - **A novel hybrid depth supervision strategy:** We introduce a hybrid training strategy combining generated RGB with high-precision ground truth and teacher pseudo-labels, enhancing geometric knowledge learning.
34
+ - **Superior performance and high training efficiency:** BRIDGE achieves SOTA MDE performance across benchmarks with significantly less data (20M vs. 62M), demonstrating excellent detail capture and robustness.
35
+
36
+ ## 📀Pre-trained Models
37
+ Download the checkpoint from [huggingface](https://huggingface.co/Dingning/BRIDGE/resolve/main/bridge.pth) and put it under the `checkpoints` directory.
38
+
39
+ ## 🏋️Preparation
40
+
41
+ ```bash
42
+ git clone https://github.com/lnbxldn/Bridge.git
43
+ cd Bridge
44
+ pip install -r requirements.txt
45
+ ```
46
+
47
+ ## 💻Inference
48
+
49
+ ```python
50
+ import cv2
51
+ import torch
52
+ import numpy as np
53
+ from bridge.dpt import Bridge
54
+
55
+ # Define DEVICE based on availability (e.g., 'cuda' if GPU is available, else 'cpu')
56
+ DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
57
+
58
+ model = Bridge()
59
+ model.load_state_dict(torch.load(f'checkpoints/bridge.pth', map_location='cpu'))
60
+ model = model.to(DEVICE).eval()
61
+
62
+ raw_img = cv2.imread('your/image/path')
63
+ depth = model.infer_image(raw_img)
64
+ ```
65
+
66
+ ## 🔍Citation
67
+
68
+ If you find this project useful, please citing:
69
+
70
+ ```bibtex
71
+ @misc{Liu2025BRIDGE,
72
+ title={BRIDGE - Building Reinforcement-Learning Depth-to-Image Data Generation Engine for Monocular Depth Estimation},
73
+ author={Liu, Dingning and Guo, Haoyu and Zhou, Jingyi and He, Tong},
74
+ year={2025},
75
+ eprint={2509.25077},
76
+ archivePrefix={arXiv},
77
+ primaryClass={cs.CV},
78
+ url={https://arxiv.org/abs/2509.25077},
79
+ }
80
+ ```