| | --- |
| | license: apache-2.0 |
| | library_name: onnx |
| | tags: |
| | - depth-estimation |
| | - panoramic |
| | - 360-degree |
| | - webgpu |
| | - onnx |
| | pipeline_tag: depth-estimation |
| | --- |
| | |
| | # DA-2: Depth Anything in Any Direction (ONNX WebGPU Version) |
| |
|
| | This repository contains the **ONNX** weights for [DA-2: Depth Anything in Any Direction](https://github.com/EnVision-Research/DA-2), optimized for **WebGPU** inference in the browser. |
| |
|
| | ## Model Details |
| |
|
| | - **Original Model:** [haodongli/DA-2](https://huggingface.co/haodongli/DA-2) |
| | - **Framework:** ONNX (Opset 17) |
| | - **Precision:** FP32 (Full Precision) |
| | - **Input Resolution:** 1092x546 |
| | - **Size:** ~1.4 GB |
| |
|
| | ## Conversion Details |
| |
|
| | This model was converted from the original PyTorch weights to ONNX to enable client-side inference using `onnxruntime-web`. |
| |
|
| | - **Optimization:** Constant folding applied. |
| | - **Compatibility:** Verified with WebGPU backend. |
| | - **Modifications:** |
| | - Replaced `clamp` operators with `Max`/`Min` combinations to ensure WebGPU kernel compatibility. |
| | - Removed internal normalization layers to allow raw 0-1 input from the browser. |
| |
|
| | ## Usage (Transformers.js) |
| |
|
| | You can also run this model using [Transformers.js](https://huggingface.co/docs/transformers.js). |
| |
|
| | ```javascript |
| | import { pipeline } from '@xenova/transformers'; |
| | |
| | // Initialize the pipeline |
| | const depth_estimator = await pipeline('depth-estimation', 'phiph/DA-2-WebGPU', { |
| | device: 'webgpu', |
| | dtype: 'fp32', // Use FP32 as exported |
| | }); |
| | |
| | // Run inference |
| | const url = 'path/to/your/panorama.jpg'; |
| | const output = await depth_estimator(url); |
| | // output.depth is the raw tensor |
| | // output.mask is the visualized depth map |
| | ``` |
| |
|
| | ## Usage (ONNX Runtime Web) |
| |
|
| | You can run this model in the browser using `onnxruntime-web`. |
| |
|
| | ```javascript |
| | import * as ort from 'onnxruntime-web/webgpu'; |
| | |
| | // 1. Initialize Session |
| | // Note: Model is now in the 'onnx' subdirectory |
| | const session = await ort.InferenceSession.create('https://huggingface.co/phiph/DA-2-WebGPU/resolve/main/onnx/model.onnx', { |
| | executionProviders: ['webgpu'], |
| | preferredOutputLocation: { last_hidden_state: 'gpu-buffer' } |
| | }); |
| | |
| | // 2. Prepare Input (Float32, 0-1 range, NCHW) |
| | // Note: Do NOT apply ImageNet mean/std normalization. The model expects raw 0-1 floats. |
| | const tensor = new ort.Tensor('float32', float32Data, [1, 3, 546, 1092]); |
| | |
| | // 3. Run Inference |
| | const results = await session.run({ images: tensor }); |
| | const depthMap = results.depth; // Access output |
| | ``` |
| |
|
| | ## License |
| |
|
| | This model is a derivative work of [DA-2](https://github.com/EnVision-Research/DA-2) and is distributed under the **Apache License 2.0**. |
| |
|
| | Please cite the original authors if you use this model: |
| |
|
| | ```bibtex |
| | @article{li2025depth, |
| | title={DA$^{2}$: Depth Anything in Any Direction}, |
| | author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao}, |
| | journal={arXiv preprint arXiv:2509.26618}, |
| | year={2025} |
| | } |
| | ``` |
| |
|