phiph commited on
Commit
f978cec
·
verified ·
1 Parent(s): 10d3fd4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +78 -85
README.md CHANGED
@@ -1,98 +1,91 @@
1
- # DA-2 WebGPU Port
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- This repository contains a port of the **DA-2 (Depth Anything in Any Direction)** model to run entirely in the browser using **WebGPU** and **ONNX Runtime**.
4
 
5
- The original work was developed by EnVision-Research. This port enables real-time, client-side depth estimation from panoramic images without requiring a backend server for inference.
 
6
 
7
- ## 🔗 Original Work
 
 
 
 
 
8
 
9
- **DA<sup>2</sup>: Depth Anything in Any Direction**
 
 
 
 
 
 
 
10
 
11
- * **Repository:** [EnVision-Research/DA-2](https://github.com/EnVision-Research/DA-2)
12
- * **Paper:** [arXiv:2509.26618](http://arxiv.org/abs/2509.26618)
13
- * **Project Page:** [depth-any-in-any-dir.github.io](https://depth-any-in-any-dir.github.io/)
14
 
15
- Please cite the original paper if you use this work:
 
 
16
 
17
  ```bibtex
18
- @article{li2025da2,
19
- title={DA2: Depth Anything in Any Direction},
20
  author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
21
  journal={arXiv preprint arXiv:2509.26618},
22
  year={2025}
23
  }
24
  ```
25
-
26
- ## 🚀 WebGPU Demo
27
-
28
- This project includes a web-based demo that runs the model directly in your browser.
29
-
30
- ### Prerequisites
31
-
32
- * **Python 3.10+** (for model export)
33
- * **Web Browser** with WebGPU support (Chrome 113+, Edge 113+, or Firefox Nightly).
34
-
35
- ### Installation
36
-
37
- 1. **Clone the repository:**
38
- ```bash
39
- git clone <your-repo-url>
40
- cd DA-2-Web
41
- ```
42
-
43
- 2. **Set up Python environment:**
44
- ```bash
45
- python3 -m venv venv
46
- source venv/bin/activate # On Windows: venv\Scripts\activate
47
- pip install -r requirements.txt
48
- ```
49
-
50
- ### Model Preparation
51
-
52
- To run the demo, you first need to convert the PyTorch model to ONNX format.
53
-
54
- 1. **Download the model weights:**
55
- Download `model.safetensors` from the [HuggingFace repository](https://huggingface.co/haodongli/DA-2) and place it in the root directory of this project.
56
-
57
- 2. **Export to ONNX:**
58
- Run the export script. This script handles the conversion to FP16 and applies necessary fixes for WebGPU compatibility (e.g., replacing `clamp` with `max`/`min`).
59
- ```bash
60
- python export_onnx.py
61
- ```
62
- This will generate `da2_model.onnx`.
63
-
64
- 3. **Merge ONNX files:**
65
- The export process might generate external data files. Use the merge script to create a single `.onnx` file for easier web loading.
66
- ```bash
67
- python merge_onnx.py
68
- ```
69
- This will generate `da2_model_single.onnx`.
70
-
71
- ### Running the Demo
72
-
73
- 1. **Start a local web server:**
74
- You need to serve the files over HTTP(S) for the browser to load the model and WebGPU context.
75
- ```bash
76
- python3 -m http.server 8000
77
- ```
78
-
79
- 2. **Open in Browser:**
80
- Navigate to `http://localhost:8000/web/` in your WebGPU-compatible browser.
81
-
82
- 3. **Usage:**
83
- * Click "Choose File" to upload a panoramic image.
84
- * Click "Run Inference" to generate the depth map.
85
- * The process runs entirely locally on your GPU.
86
-
87
- ## 🛠️ Technical Details of the Port
88
-
89
- * **Precision:** The model was converted to **FP16 (Half Precision)** to reduce file size (~1.4GB -> ~700MB) and improve performance on consumer GPUs.
90
- * **Opset:** Exported using **ONNX Opset 17**.
91
- * **Modifications:**
92
- * The `SphereViT` and `ViT_w_Esphere` modules were modified to ensure strict FP16 compatibility.
93
- * `torch.clamp` operations were replaced with `torch.max` and `torch.min` combinations to avoid `Clip` operator issues in `onnxruntime-web` when handling mixed scalar/tensor inputs.
94
- * Sphere embeddings are pre-calculated and cast to FP16 within the model graph.
95
-
96
- ## 📄 License
97
-
98
- This project follows the license of the original [DA-2 repository](https://github.com/EnVision-Research/DA-2). Please refer to the original repository for license details.
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: onnx
4
+ tags:
5
+ - depth-estimation
6
+ - panoramic
7
+ - 360-degree
8
+ - webgpu
9
+ - onnx
10
+ pipeline_tag: depth-estimation
11
+ ---
12
+
13
+ # DA-2: Depth Anything in Any Direction (ONNX WebGPU Version)
14
+
15
+ This repository contains the **ONNX** weights for [DA-2: Depth Anything in Any Direction](https://github.com/EnVision-Research/DA-2), optimized for **WebGPU** inference in the browser.
16
+
17
+ ## Model Details
18
+
19
+ - **Original Model:** [haodongli/DA-2](https://huggingface.co/haodongli/DA-2)
20
+ - **Framework:** ONNX (Opset 17)
21
+ - **Precision:** FP32 (Full Precision)
22
+ - **Input Resolution:** 1092x546
23
+ - **Size:** ~1.4 GB
24
+
25
+ ## Conversion Details
26
+
27
+ This model was converted from the original PyTorch weights to ONNX to enable client-side inference using `onnxruntime-web`.
28
+
29
+ - **Optimization:** Constant folding applied.
30
+ - **Compatibility:** Verified with WebGPU backend.
31
+ - **Modifications:**
32
+ - Replaced `clamp` operators with `Max`/`Min` combinations to ensure WebGPU kernel compatibility.
33
+ - Removed internal normalization layers to allow raw 0-1 input from the browser.
34
+
35
+ ## Usage (Transformers.js)
36
+
37
+ You can also run this model using [Transformers.js](https://huggingface.co/docs/transformers.js).
38
+
39
+ ```javascript
40
+ import { pipeline } from '@xenova/transformers';
41
+
42
+ // Initialize the pipeline
43
+ const depth_estimator = await pipeline('depth-estimation', 'phiph/DA-2-WebGPU', {
44
+ device: 'webgpu',
45
+ dtype: 'fp32', // Use FP32 as exported
46
+ });
47
+
48
+ // Run inference
49
+ const url = 'path/to/your/panorama.jpg';
50
+ const output = await depth_estimator(url);
51
+ // output.depth is the raw tensor
52
+ // output.mask is the visualized depth map
53
+ ```
54
+
55
+ ## Usage (ONNX Runtime Web)
56
 
57
+ You can run this model in the browser using `onnxruntime-web`.
58
 
59
+ ```javascript
60
+ import * as ort from 'onnxruntime-web/webgpu';
61
 
62
+ // 1. Initialize Session
63
+ // Note: Model is now in the 'onnx' subdirectory
64
+ const session = await ort.InferenceSession.create('https://huggingface.co/phiph/DA-2-WebGPU/resolve/main/onnx/model.onnx', {
65
+ executionProviders: ['webgpu'],
66
+ preferredOutputLocation: { last_hidden_state: 'gpu-buffer' }
67
+ });
68
 
69
+ // 2. Prepare Input (Float32, 0-1 range, NCHW)
70
+ // Note: Do NOT apply ImageNet mean/std normalization. The model expects raw 0-1 floats.
71
+ const tensor = new ort.Tensor('float32', float32Data, [1, 3, 546, 1092]);
72
+
73
+ // 3. Run Inference
74
+ const results = await session.run({ images: tensor });
75
+ const depthMap = results.depth; // Access output
76
+ ```
77
 
78
+ ## License
 
 
79
 
80
+ This model is a derivative work of [DA-2](https://github.com/EnVision-Research/DA-2) and is distributed under the **Apache License 2.0**.
81
+
82
+ Please cite the original authors if you use this model:
83
 
84
  ```bibtex
85
+ @article{li2025depth,
86
+ title={DA$^{2}$: Depth Anything in Any Direction},
87
  author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
88
  journal={arXiv preprint arXiv:2509.26618},
89
  year={2025}
90
  }
91
  ```