phiph
/

DA-2-WebGPU

ONNX

depth_anything

Model card Files Files and versions

xet

Community

phiph commited on Nov 27, 2025

Commit

f978cec

verified ·

1 Parent(s): 10d3fd4

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +78 -85

README.md CHANGED Viewed

@@ -1,98 +1,91 @@
-# DA-2 WebGPU Port
-This repository contains a port of the **DA-2 (Depth Anything in Any Direction)** model to run entirely in the browser using **WebGPU** and **ONNX Runtime**.
-The original work was developed by EnVision-Research. This port enables real-time, client-side depth estimation from panoramic images without requiring a backend server for inference.
-## 🔗 Original Work
-**DA<sup>2</sup>: Depth Anything in Any Direction**
-*   **Repository:** [EnVision-Research/DA-2](https://github.com/EnVision-Research/DA-2)
-*   **Paper:** [arXiv:2509.26618](http://arxiv.org/abs/2509.26618)
-*   **Project Page:** [depth-any-in-any-dir.github.io](https://depth-any-in-any-dir.github.io/)
-Please cite the original paper if you use this work:
 ```bibtex
-@article{li2025da2,
-  title={DA2: Depth Anything in Any Direction},
   author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
   journal={arXiv preprint arXiv:2509.26618},
   year={2025}
 }
 ```
-## 🚀 WebGPU Demo
-This project includes a web-based demo that runs the model directly in your browser.
-### Prerequisites
-*   **Python 3.10+** (for model export)
-*   **Web Browser** with WebGPU support (Chrome 113+, Edge 113+, or Firefox Nightly).
-### Installation
-1.  **Clone the repository:**
-    ```bash
-    git clone <your-repo-url>
-    cd DA-2-Web
-    ```
-2.  **Set up Python environment:**
-    ```bash
-    python3 -m venv venv
-    source venv/bin/activate  # On Windows: venv\Scripts\activate
-    pip install -r requirements.txt
-    ```
-### Model Preparation
-To run the demo, you first need to convert the PyTorch model to ONNX format.
-1.  **Download the model weights:**
-    Download `model.safetensors` from the [HuggingFace repository](https://huggingface.co/haodongli/DA-2) and place it in the root directory of this project.
-2.  **Export to ONNX:**
-    Run the export script. This script handles the conversion to FP16 and applies necessary fixes for WebGPU compatibility (e.g., replacing `clamp` with `max`/`min`).
-    ```bash
-    python export_onnx.py
-    ```
-    This will generate `da2_model.onnx`.
-3.  **Merge ONNX files:**
-    The export process might generate external data files. Use the merge script to create a single `.onnx` file for easier web loading.
-    ```bash
-    python merge_onnx.py
-    ```
-    This will generate `da2_model_single.onnx`.
-### Running the Demo
-1.  **Start a local web server:**
-    You need to serve the files over HTTP(S) for the browser to load the model and WebGPU context.
-    ```bash
-    python3 -m http.server 8000
-    ```
-2.  **Open in Browser:**
-    Navigate to `http://localhost:8000/web/` in your WebGPU-compatible browser.
-3.  **Usage:**
-    *   Click "Choose File" to upload a panoramic image.
-    *   Click "Run Inference" to generate the depth map.
-    *   The process runs entirely locally on your GPU.
-## 🛠️ Technical Details of the Port
-*   **Precision:** The model was converted to **FP16 (Half Precision)** to reduce file size (~1.4GB -> ~700MB) and improve performance on consumer GPUs.
-*   **Opset:** Exported using **ONNX Opset 17**.
-*   **Modifications:**
-    *   The `SphereViT` and `ViT_w_Esphere` modules were modified to ensure strict FP16 compatibility.
-    *   `torch.clamp` operations were replaced with `torch.max` and `torch.min` combinations to avoid `Clip` operator issues in `onnxruntime-web` when handling mixed scalar/tensor inputs.
-    *   Sphere embeddings are pre-calculated and cast to FP16 within the model graph.
-## 📄 License
-This project follows the license of the original [DA-2 repository](https://github.com/EnVision-Research/DA-2). Please refer to the original repository for license details.

+---
+license: apache-2.0
+library_name: onnx
+tags:
+- depth-estimation
+- panoramic
+- 360-degree
+- webgpu
+- onnx
+pipeline_tag: depth-estimation
+---
+# DA-2: Depth Anything in Any Direction (ONNX WebGPU Version)
+This repository contains the **ONNX** weights for [DA-2: Depth Anything in Any Direction](https://github.com/EnVision-Research/DA-2), optimized for **WebGPU** inference in the browser.
+## Model Details
+- **Original Model:** [haodongli/DA-2](https://huggingface.co/haodongli/DA-2)
+- **Framework:** ONNX (Opset 17)
+- **Precision:** FP32 (Full Precision)
+- **Input Resolution:** 1092x546
+- **Size:** ~1.4 GB
+## Conversion Details
+This model was converted from the original PyTorch weights to ONNX to enable client-side inference using `onnxruntime-web`.
+- **Optimization:** Constant folding applied.
+- **Compatibility:** Verified with WebGPU backend.
+- **Modifications:**
+  - Replaced `clamp` operators with `Max`/`Min` combinations to ensure WebGPU kernel compatibility.
+  - Removed internal normalization layers to allow raw 0-1 input from the browser.
+## Usage (Transformers.js)
+You can also run this model using [Transformers.js](https://huggingface.co/docs/transformers.js).
+```javascript
+import { pipeline } from '@xenova/transformers';
+// Initialize the pipeline
+const depth_estimator = await pipeline('depth-estimation', 'phiph/DA-2-WebGPU', {
+    device: 'webgpu',
+    dtype: 'fp32', // Use FP32 as exported
+});
+// Run inference
+const url = 'path/to/your/panorama.jpg';
+const output = await depth_estimator(url);
+// output.depth is the raw tensor
+// output.mask is the visualized depth map
+```
+## Usage (ONNX Runtime Web)
+You can run this model in the browser using `onnxruntime-web`.
+```javascript
+import * as ort from 'onnxruntime-web/webgpu';
+// 1. Initialize Session
+// Note: Model is now in the 'onnx' subdirectory
+const session = await ort.InferenceSession.create('https://huggingface.co/phiph/DA-2-WebGPU/resolve/main/onnx/model.onnx', {
+    executionProviders: ['webgpu'],
+    preferredOutputLocation: { last_hidden_state: 'gpu-buffer' }
+});
+// 2. Prepare Input (Float32, 0-1 range, NCHW)
+// Note: Do NOT apply ImageNet mean/std normalization. The model expects raw 0-1 floats.
+const tensor = new ort.Tensor('float32', float32Data, [1, 3, 546, 1092]);
+// 3. Run Inference
+const results = await session.run({ images: tensor });
+const depthMap = results.depth; // Access output
+```
+## License
+This model is a derivative work of [DA-2](https://github.com/EnVision-Research/DA-2) and is distributed under the **Apache License 2.0**.
+Please cite the original authors if you use this model:
 ```bibtex
+@article{li2025depth,
+  title={DA$^{2}$: Depth Anything in Any Direction},
   author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
   journal={arXiv preprint arXiv:2509.26618},
   year={2025}
 }
 ```