phiph
/

DA-2-WebGPU

Model card Files Files and versions

DA-2-WebGPU / HF_MODEL_CARD.md

phiph's picture

Upload folder using huggingface_hub

7382c66 verified 3 months ago

|

history blame contribute delete

2.92 kB

	---
	license: apache-2.0
	library_name: onnx
	tags:
	- depth-estimation
	- panoramic
	- 360-degree
	- webgpu
	- onnx
	pipeline_tag: depth-estimation
	---

	# DA-2: Depth Anything in Any Direction (ONNX WebGPU Version)

	This repository contains the ONNX weights for [DA-2: Depth Anything in Any Direction](https://github.com/EnVision-Research/DA-2), optimized for WebGPU inference in the browser.

	## Model Details

	- Original Model: [haodongli/DA-2](https://huggingface.co/haodongli/DA-2)
	- Framework: ONNX (Opset 17)
	- Precision: FP32 (Full Precision)
	- Input Resolution: 1092x546
	- Size: ~1.4 GB

	## Conversion Details

	This model was converted from the original PyTorch weights to ONNX to enable client-side inference using `onnxruntime-web`.

	- Optimization: Constant folding applied.
	- Compatibility: Verified with WebGPU backend.
	- Modifications:
	- Replaced `clamp` operators with `Max`/`Min` combinations to ensure WebGPU kernel compatibility.
	- Removed internal normalization layers to allow raw 0-1 input from the browser.

	## Usage (Transformers.js)

	You can also run this model using [Transformers.js](https://huggingface.co/docs/transformers.js).

	```javascript
	import { pipeline } from '@xenova/transformers';

	// Initialize the pipeline
	const depth_estimator = await pipeline('depth-estimation', 'phiph/DA-2-WebGPU', {
	device: 'webgpu',
	dtype: 'fp32', // Use FP32 as exported
	});

	// Run inference
	const url = 'path/to/your/panorama.jpg';
	const output = await depth_estimator(url);
	// output.depth is the raw tensor
	// output.mask is the visualized depth map
	```

	## Usage (ONNX Runtime Web)

	You can run this model in the browser using `onnxruntime-web`.

	```javascript
	import * as ort from 'onnxruntime-web/webgpu';

	// 1. Initialize Session
	// Note: Model is now in the 'onnx' subdirectory
	const session = await ort.InferenceSession.create('https://huggingface.co/phiph/DA-2-WebGPU/resolve/main/onnx/model.onnx', {
	executionProviders: ['webgpu'],
	preferredOutputLocation: { last_hidden_state: 'gpu-buffer' }
	});

	// 2. Prepare Input (Float32, 0-1 range, NCHW)
	// Note: Do NOT apply ImageNet mean/std normalization. The model expects raw 0-1 floats.
	const tensor = new ort.Tensor('float32', float32Data, [1, 3, 546, 1092]);

	// 3. Run Inference
	const results = await session.run({ images: tensor });
	const depthMap = results.depth; // Access output
	```

	## License

	This model is a derivative work of [DA-2](https://github.com/EnVision-Research/DA-2) and is distributed under the Apache License 2.0.

	Please cite the original authors if you use this model:

	```bibtex
	@article{li2025depth,
	title={DA$^{2}$: Depth Anything in Any Direction},
	author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
	journal={arXiv preprint arXiv:2509.26618},
	year={2025}
	}
	```