File size: 3,747 Bytes
20572f4 7382c66 20572f4 7382c66 20572f4 7382c66 20572f4 7382c66 20572f4 7382c66 20572f4 7382c66 20572f4 7382c66 20572f4 7382c66 20572f4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | # DA-2 WebGPU Port
This repository contains a port of the **DA-2 (Depth Anything in Any Direction)** model to run entirely in the browser using **WebGPU** and **ONNX Runtime**.
The original work was developed by EnVision-Research. This port enables real-time, client-side depth estimation from panoramic images without requiring a backend server for inference.
## ๐ Original Work
**DA<sup>2</sup>: Depth Anything in Any Direction**
* **Repository:** [EnVision-Research/DA-2](https://github.com/EnVision-Research/DA-2)
* **Paper:** [arXiv:2509.26618](http://arxiv.org/abs/2509.26618)
* **Project Page:** [depth-any-in-any-dir.github.io](https://depth-any-in-any-dir.github.io/)
Please cite the original paper if you use this work:
```bibtex
@article{li2025da2,
title={DA2: Depth Anything in Any Direction},
author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
journal={arXiv preprint arXiv:2509.26618},
year={2025}
}
```
## ๐ WebGPU Demo
This project includes a web-based demo that runs the model directly in your browser.
### Prerequisites
* **Python 3.10+** (for model export)
* **Web Browser** with WebGPU support (Chrome 113+, Edge 113+, or Firefox Nightly).
### Installation
1. **Clone the repository:**
```bash
git clone <your-repo-url>
cd DA-2-Web
```
2. **Set up Python environment:**
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
```
### Model Preparation
To run the demo, you first need to convert the PyTorch model to ONNX format.
1. **Download the model weights:**
Download `model.safetensors` from the [HuggingFace repository](https://huggingface.co/haodongli/DA-2) and place it in the root directory of this project.
2. **Export to ONNX:**
Run the export script. This script handles the conversion to FP16 and applies necessary fixes for WebGPU compatibility (e.g., replacing `clamp` with `max`/`min`).
```bash
python export_onnx.py
```
This will generate `da2_model.onnx`.
3. **Merge ONNX files:**
The export process might generate external data files. Use the merge script to create a single `.onnx` file for easier web loading.
```bash
python merge_onnx.py
```
This will generate `da2_model_single.onnx`.
### Running the Demo
1. **Start a local web server:**
You need to serve the files over HTTP(S) for the browser to load the model and WebGPU context.
```bash
python3 -m http.server 8000
```
2. **Open in Browser:**
Navigate to `http://localhost:8000/web/` in your WebGPU-compatible browser.
3. **Usage:**
* Click "Choose File" to upload a panoramic image.
* Click "Run Inference" to generate the depth map.
* The process runs entirely locally on your GPU.
## ๐ ๏ธ Technical Details of the Port
* **Precision:** The model was converted to **FP16 (Half Precision)** to reduce file size (~1.4GB -> ~700MB) and improve performance on consumer GPUs.
* **Opset:** Exported using **ONNX Opset 17**.
* **Modifications:**
* The `SphereViT` and `ViT_w_Esphere` modules were modified to ensure strict FP16 compatibility.
* `torch.clamp` operations were replaced with `torch.max` and `torch.min` combinations to avoid `Clip` operator issues in `onnxruntime-web` when handling mixed scalar/tensor inputs.
* Sphere embeddings are pre-calculated and cast to FP16 within the model graph.
## ๐ License
This project follows the license of the original [DA-2 repository](https://github.com/EnVision-Research/DA-2). Please refer to the original repository for license details.
|