File size: 3,747 Bytes
20572f4
7382c66
20572f4
7382c66
20572f4
7382c66
20572f4
7382c66
20572f4
7382c66
20572f4
 
 
7382c66
20572f4
7382c66
 
20572f4
 
7382c66
 
 
 
 
20572f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# DA-2 WebGPU Port

This repository contains a port of the **DA-2 (Depth Anything in Any Direction)** model to run entirely in the browser using **WebGPU** and **ONNX Runtime**.

The original work was developed by EnVision-Research. This port enables real-time, client-side depth estimation from panoramic images without requiring a backend server for inference.

## ๐Ÿ”— Original Work

**DA<sup>2</sup>: Depth Anything in Any Direction**

*   **Repository:** [EnVision-Research/DA-2](https://github.com/EnVision-Research/DA-2)
*   **Paper:** [arXiv:2509.26618](http://arxiv.org/abs/2509.26618)
*   **Project Page:** [depth-any-in-any-dir.github.io](https://depth-any-in-any-dir.github.io/)

Please cite the original paper if you use this work:

```bibtex
@article{li2025da2,
  title={DA2: Depth Anything in Any Direction},
  author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
  journal={arXiv preprint arXiv:2509.26618},
  year={2025}
}
```

## ๐Ÿš€ WebGPU Demo

This project includes a web-based demo that runs the model directly in your browser.

### Prerequisites

*   **Python 3.10+** (for model export)
*   **Web Browser** with WebGPU support (Chrome 113+, Edge 113+, or Firefox Nightly).

### Installation

1.  **Clone the repository:**
    ```bash
    git clone <your-repo-url>
    cd DA-2-Web
    ```

2.  **Set up Python environment:**
    ```bash
    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -r requirements.txt
    ```

### Model Preparation

To run the demo, you first need to convert the PyTorch model to ONNX format.

1.  **Download the model weights:**
    Download `model.safetensors` from the [HuggingFace repository](https://huggingface.co/haodongli/DA-2) and place it in the root directory of this project.

2.  **Export to ONNX:**
    Run the export script. This script handles the conversion to FP16 and applies necessary fixes for WebGPU compatibility (e.g., replacing `clamp` with `max`/`min`).
    ```bash
    python export_onnx.py
    ```
    This will generate `da2_model.onnx`.

3.  **Merge ONNX files:**
    The export process might generate external data files. Use the merge script to create a single `.onnx` file for easier web loading.
    ```bash
    python merge_onnx.py
    ```
    This will generate `da2_model_single.onnx`.

### Running the Demo

1.  **Start a local web server:**
    You need to serve the files over HTTP(S) for the browser to load the model and WebGPU context.
    ```bash
    python3 -m http.server 8000
    ```

2.  **Open in Browser:**
    Navigate to `http://localhost:8000/web/` in your WebGPU-compatible browser.

3.  **Usage:**
    *   Click "Choose File" to upload a panoramic image.
    *   Click "Run Inference" to generate the depth map.
    *   The process runs entirely locally on your GPU.

## ๐Ÿ› ๏ธ Technical Details of the Port

*   **Precision:** The model was converted to **FP16 (Half Precision)** to reduce file size (~1.4GB -> ~700MB) and improve performance on consumer GPUs.
*   **Opset:** Exported using **ONNX Opset 17**.
*   **Modifications:**
    *   The `SphereViT` and `ViT_w_Esphere` modules were modified to ensure strict FP16 compatibility.
    *   `torch.clamp` operations were replaced with `torch.max` and `torch.min` combinations to avoid `Clip` operator issues in `onnxruntime-web` when handling mixed scalar/tensor inputs.
    *   Sphere embeddings are pre-calculated and cast to FP16 within the model graph.

## ๐Ÿ“„ License

This project follows the license of the original [DA-2 repository](https://github.com/EnVision-Research/DA-2). Please refer to the original repository for license details.