File size: 2,920 Bytes
7382c66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
license: apache-2.0
library_name: onnx
tags:
- depth-estimation
- panoramic
- 360-degree
- webgpu
- onnx
pipeline_tag: depth-estimation
---

# DA-2: Depth Anything in Any Direction (ONNX WebGPU Version)

This repository contains the **ONNX** weights for [DA-2: Depth Anything in Any Direction](https://github.com/EnVision-Research/DA-2), optimized for **WebGPU** inference in the browser.

## Model Details

- **Original Model:** [haodongli/DA-2](https://huggingface.co/haodongli/DA-2)
- **Framework:** ONNX (Opset 17)
- **Precision:** FP32 (Full Precision)
- **Input Resolution:** 1092x546
- **Size:** ~1.4 GB

## Conversion Details

This model was converted from the original PyTorch weights to ONNX to enable client-side inference using `onnxruntime-web`.

- **Optimization:** Constant folding applied.
- **Compatibility:** Verified with WebGPU backend.
- **Modifications:** 
  - Replaced `clamp` operators with `Max`/`Min` combinations to ensure WebGPU kernel compatibility.
  - Removed internal normalization layers to allow raw 0-1 input from the browser.

## Usage (Transformers.js)

You can also run this model using [Transformers.js](https://huggingface.co/docs/transformers.js).

```javascript
import { pipeline } from '@xenova/transformers';

// Initialize the pipeline
const depth_estimator = await pipeline('depth-estimation', 'phiph/DA-2-WebGPU', {
    device: 'webgpu',
    dtype: 'fp32', // Use FP32 as exported
});

// Run inference
const url = 'path/to/your/panorama.jpg';
const output = await depth_estimator(url);
// output.depth is the raw tensor
// output.mask is the visualized depth map
```

## Usage (ONNX Runtime Web)

You can run this model in the browser using `onnxruntime-web`.

```javascript
import * as ort from 'onnxruntime-web/webgpu';

// 1. Initialize Session
// Note: Model is now in the 'onnx' subdirectory
const session = await ort.InferenceSession.create('https://huggingface.co/phiph/DA-2-WebGPU/resolve/main/onnx/model.onnx', {
    executionProviders: ['webgpu'],
    preferredOutputLocation: { last_hidden_state: 'gpu-buffer' }
});

// 2. Prepare Input (Float32, 0-1 range, NCHW)
// Note: Do NOT apply ImageNet mean/std normalization. The model expects raw 0-1 floats.
const tensor = new ort.Tensor('float32', float32Data, [1, 3, 546, 1092]);

// 3. Run Inference
const results = await session.run({ images: tensor });
const depthMap = results.depth; // Access output
```

## License

This model is a derivative work of [DA-2](https://github.com/EnVision-Research/DA-2) and is distributed under the **Apache License 2.0**.

Please cite the original authors if you use this model:

```bibtex
@article{li2025depth,
  title={DA$^{2}$: Depth Anything in Any Direction},
  author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
  journal={arXiv preprint arXiv:2509.26618},
  year={2025}
}
```