DA-2-WebGPU / HF_MODEL_CARD.md
phiph's picture
Upload folder using huggingface_hub
7382c66 verified
---
license: apache-2.0
library_name: onnx
tags:
- depth-estimation
- panoramic
- 360-degree
- webgpu
- onnx
pipeline_tag: depth-estimation
---
# DA-2: Depth Anything in Any Direction (ONNX WebGPU Version)
This repository contains the **ONNX** weights for [DA-2: Depth Anything in Any Direction](https://github.com/EnVision-Research/DA-2), optimized for **WebGPU** inference in the browser.
## Model Details
- **Original Model:** [haodongli/DA-2](https://huggingface.co/haodongli/DA-2)
- **Framework:** ONNX (Opset 17)
- **Precision:** FP32 (Full Precision)
- **Input Resolution:** 1092x546
- **Size:** ~1.4 GB
## Conversion Details
This model was converted from the original PyTorch weights to ONNX to enable client-side inference using `onnxruntime-web`.
- **Optimization:** Constant folding applied.
- **Compatibility:** Verified with WebGPU backend.
- **Modifications:**
- Replaced `clamp` operators with `Max`/`Min` combinations to ensure WebGPU kernel compatibility.
- Removed internal normalization layers to allow raw 0-1 input from the browser.
## Usage (Transformers.js)
You can also run this model using [Transformers.js](https://huggingface.co/docs/transformers.js).
```javascript
import { pipeline } from '@xenova/transformers';
// Initialize the pipeline
const depth_estimator = await pipeline('depth-estimation', 'phiph/DA-2-WebGPU', {
device: 'webgpu',
dtype: 'fp32', // Use FP32 as exported
});
// Run inference
const url = 'path/to/your/panorama.jpg';
const output = await depth_estimator(url);
// output.depth is the raw tensor
// output.mask is the visualized depth map
```
## Usage (ONNX Runtime Web)
You can run this model in the browser using `onnxruntime-web`.
```javascript
import * as ort from 'onnxruntime-web/webgpu';
// 1. Initialize Session
// Note: Model is now in the 'onnx' subdirectory
const session = await ort.InferenceSession.create('https://huggingface.co/phiph/DA-2-WebGPU/resolve/main/onnx/model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: { last_hidden_state: 'gpu-buffer' }
});
// 2. Prepare Input (Float32, 0-1 range, NCHW)
// Note: Do NOT apply ImageNet mean/std normalization. The model expects raw 0-1 floats.
const tensor = new ort.Tensor('float32', float32Data, [1, 3, 546, 1092]);
// 3. Run Inference
const results = await session.run({ images: tensor });
const depthMap = results.depth; // Access output
```
## License
This model is a derivative work of [DA-2](https://github.com/EnVision-Research/DA-2) and is distributed under the **Apache License 2.0**.
Please cite the original authors if you use this model:
```bibtex
@article{li2025depth,
title={DA$^{2}$: Depth Anything in Any Direction},
author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
journal={arXiv preprint arXiv:2509.26618},
year={2025}
}
```