Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

.gitignore +1 -0
README.md +85 -78
config.json +3 -0
export_onnx.py +18 -10
preprocessor_config.json +1 -1
web/script.js +2 -2

.gitignore CHANGED Viewed

@@ -3,4 +3,5 @@ __pycache__/
 *.pyc
 .DS_Store
 *.safetensors
 .vscode/

 *.pyc
 .DS_Store
 *.safetensors
+*.onnx
 .vscode/

README.md CHANGED Viewed

@@ -1,91 +1,98 @@
----
-license: apache-2.0
-library_name: onnx
-tags:
-- depth-estimation
-- panoramic
-- 360-degree
-- webgpu
-- onnx
-pipeline_tag: depth-estimation
----
-# DA-2: Depth Anything in Any Direction (ONNX WebGPU Version)
-This repository contains the **ONNX** weights for [DA-2: Depth Anything in Any Direction](https://github.com/EnVision-Research/DA-2), optimized for **WebGPU** inference in the browser.
-## Model Details
-- **Original Model:** [haodongli/DA-2](https://huggingface.co/haodongli/DA-2)
-- **Framework:** ONNX (Opset 17)
-- **Precision:** FP32 (Full Precision)
-- **Input Resolution:** 1092x546
-- **Size:** ~1.4 GB
-## Conversion Details
-This model was converted from the original PyTorch weights to ONNX to enable client-side inference using `onnxruntime-web`.
-- **Optimization:** Constant folding applied.
-- **Compatibility:** Verified with WebGPU backend.
-- **Modifications:**
-  - Replaced `clamp` operators with `Max`/`Min` combinations to ensure WebGPU kernel compatibility.
-  - Removed internal normalization layers to allow raw 0-1 input from the browser.
-## Usage (Transformers.js)
-You can also run this model using [Transformers.js](https://huggingface.co/docs/transformers.js).
-```javascript
-import { pipeline } from '@xenova/transformers';
-// Initialize the pipeline
-const depth_estimator = await pipeline('depth-estimation', 'phiph/DA-2-WebGPU', {
-    device: 'webgpu',
-    dtype: 'fp32', // Use FP32 as exported
-});
-// Run inference
-const url = 'path/to/your/panorama.jpg';
-const output = await depth_estimator(url);
-// output.depth is the raw tensor
-// output.mask is the visualized depth map
-```
-## Usage (ONNX Runtime Web)
-You can run this model in the browser using `onnxruntime-web`.
-```javascript
-import * as ort from 'onnxruntime-web/webgpu';
-// 1. Initialize Session
-// Note: Model is now in the 'onnx' subdirectory
-const session = await ort.InferenceSession.create('https://huggingface.co/phiph/DA-2-WebGPU/resolve/main/onnx/model.onnx', {
-    executionProviders: ['webgpu'],
-    preferredOutputLocation: { last_hidden_state: 'gpu-buffer' }
-});
-// 2. Prepare Input (Float32, 0-1 range, NCHW)
-// Note: Do NOT apply ImageNet mean/std normalization. The model expects raw 0-1 floats.
-const tensor = new ort.Tensor('float32', float32Data, [1, 3, 546, 1092]);
-// 3. Run Inference
-const results = await session.run({ images: tensor });
-const depthMap = results.depth; // Access output
-```
-## License
-This model is a derivative work of [DA-2](https://github.com/EnVision-Research/DA-2) and is distributed under the **Apache License 2.0**.
-Please cite the original authors if you use this model:
 ```bibtex
-@article{li2025depth,
-  title={DA$^{2}$: Depth Anything in Any Direction},
   author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
   journal={arXiv preprint arXiv:2509.26618},
   year={2025}
 }
 ```

+# DA-2 WebGPU Port
+This repository contains a port of the **DA-2 (Depth Anything in Any Direction)** model to run entirely in the browser using **WebGPU** and **ONNX Runtime**.
+The original work was developed by EnVision-Research. This port enables real-time, client-side depth estimation from panoramic images without requiring a backend server for inference.
+## 🔗 Original Work
+**DA<sup>2</sup>: Depth Anything in Any Direction**
+*   **Repository:** [EnVision-Research/DA-2](https://github.com/EnVision-Research/DA-2)
+*   **Paper:** [arXiv:2509.26618](http://arxiv.org/abs/2509.26618)
+*   **Project Page:** [depth-any-in-any-dir.github.io](https://depth-any-in-any-dir.github.io/)
+Please cite the original paper if you use this work:
 ```bibtex
+@article{li2025da2,
+  title={DA2: Depth Anything in Any Direction},
   author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
   journal={arXiv preprint arXiv:2509.26618},
   year={2025}
 }
 ```
+## 🚀 WebGPU Demo
+This project includes a web-based demo that runs the model directly in your browser.
+### Prerequisites
+*   **Python 3.10+** (for model export)
+*   **Web Browser** with WebGPU support (Chrome 113+, Edge 113+, or Firefox Nightly).
+### Installation
+1.  **Clone the repository:**
+    ```bash
+    git clone <your-repo-url>
+    cd DA-2-Web
+    ```
+2.  **Set up Python environment:**
+    ```bash
+    python3 -m venv venv
+    source venv/bin/activate  # On Windows: venv\Scripts\activate
+    pip install -r requirements.txt
+    ```
+### Model Preparation
+To run the demo, you first need to convert the PyTorch model to ONNX format.
+1.  **Download the model weights:**
+    Download `model.safetensors` from the [HuggingFace repository](https://huggingface.co/haodongli/DA-2) and place it in the root directory of this project.
+2.  **Export to ONNX:**
+    Run the export script. This script handles the conversion to FP16 and applies necessary fixes for WebGPU compatibility (e.g., replacing `clamp` with `max`/`min`).
+    ```bash
+    python export_onnx.py
+    ```
+    This will generate `da2_model.onnx`.
+3.  **Merge ONNX files:**
+    The export process might generate external data files. Use the merge script to create a single `.onnx` file for easier web loading.
+    ```bash
+    python merge_onnx.py
+    ```
+    This will generate `da2_model_single.onnx`.
+### Running the Demo
+1.  **Start a local web server:**
+    You need to serve the files over HTTP(S) for the browser to load the model and WebGPU context.
+    ```bash
+    python3 -m http.server 8000
+    ```
+2.  **Open in Browser:**
+    Navigate to `http://localhost:8000/web/` in your WebGPU-compatible browser.
+3.  **Usage:**
+    *   Click "Choose File" to upload a panoramic image.
+    *   Click "Run Inference" to generate the depth map.
+    *   The process runs entirely locally on your GPU.
+## 🛠️ Technical Details of the Port
+*   **Precision:** The model was converted to **FP16 (Half Precision)** to reduce file size (~1.4GB -> ~700MB) and improve performance on consumer GPUs.
+*   **Opset:** Exported using **ONNX Opset 17**.
+*   **Modifications:**
+    *   The `SphereViT` and `ViT_w_Esphere` modules were modified to ensure strict FP16 compatibility.
+    *   `torch.clamp` operations were replaced with `torch.max` and `torch.min` combinations to avoid `Clip` operator issues in `onnxruntime-web` when handling mixed scalar/tensor inputs.
+    *   Sphere embeddings are pre-calculated and cast to FP16 within the model graph.
+## 📄 License
+This project follows the license of the original [DA-2 repository](https://github.com/EnVision-Research/DA-2). Please refer to the original repository for license details.

config.json CHANGED Viewed

@@ -1,4 +1,7 @@
 {
   "model_type": "depth_anything",
   "transformers_version": "4.39.0",
   "image_size": [

 {
+  "architectures": [
+    "DepthAnythingForDepthEstimation"
+  ],
   "model_type": "depth_anything",
   "transformers_version": "4.39.0",
   "image_size": [

export_onnx.py CHANGED Viewed

@@ -51,14 +51,14 @@ model.eval()
 dummy_input = torch.randn(1, 3, H, W)
 # Export
-output_file = "model.onnx"
 print(f"Exporting to {output_file}...")
 try:
     torch.onnx.export(
         model,
         dummy_input,
         output_file,
-        opset_version=18,
         input_names=["pixel_values"],
         output_names=["predicted_depth"],
         dynamic_axes={
@@ -69,15 +69,23 @@ try:
         do_constant_folding=True,
         verbose=False
     )
-    print(f"Successfully exported to {output_file}")
-    # Force single file (merge external data if any)
-    import onnx
-    print("Ensuring single ONNX file...")
-    onnx_model = onnx.load(output_file)
-    onnx.save_model(onnx_model, output_file, save_as_external_data=False)
-    print("Saved as single file.")
 except Exception as e:
     print(f"Error exporting to ONNX: {e}")
     import traceback

 dummy_input = torch.randn(1, 3, H, W)
 # Export
+output_file = "onnx/model.onnx"
 print(f"Exporting to {output_file}...")
 try:
     torch.onnx.export(
         model,
         dummy_input,
         output_file,
+        opset_version=17,
         input_names=["pixel_values"],
         output_names=["predicted_depth"],
         dynamic_axes={
         do_constant_folding=True,
         verbose=False
     )
+    print(f"Successfully exported to {output_file}")
+    # Quantize the exported ONNX model
+    try:
+        from onnxruntime.quantization import quantize_dynamic, QuantType
+        quantized_output_file = "onnx/model_quantized.onnx"
+        print(f"Quantizing model to {quantized_output_file}...")
+        quantize_dynamic(
+            output_file,
+            quantized_output_file,
+            weight_type=QuantType.QInt8
+        )
+        print(f"Successfully quantized to {quantized_output_file}")
+    except Exception as qe:
+        print(f"Error during quantization: {qe}")
+        import traceback
+        traceback.print_exc()
 except Exception as e:
     print(f"Error exporting to ONNX: {e}")
     import traceback

preprocessor_config.json CHANGED Viewed

@@ -8,5 +8,5 @@
   },
   "do_rescale": true,
   "rescale_factor": 0.00392156862745098,
-  "image_processor_type": "DPTImageProcessor"
 }

   },
   "do_rescale": true,
   "rescale_factor": 0.00392156862745098,
+  "image_processor_type": "DepthAnythingImageProcessor"
 }

web/script.js CHANGED Viewed

@@ -71,9 +71,9 @@ runBtn.addEventListener('click', async () => {
         const tensor = preprocess(imageData);
         // Run inference
-        const feeds = { input_image: tensor };
         const results = await session.run(feeds);
-        const output = results.depth_map;
         // Postprocess and visualize
         visualize(output.data, INPUT_WIDTH, INPUT_HEIGHT);

         const tensor = preprocess(imageData);
         // Run inference
+        const feeds = { pixel_values: tensor };
         const results = await session.run(feeds);
+        const output = results.predicted_depth;
         // Postprocess and visualize
         visualize(output.data, INPUT_WIDTH, INPUT_HEIGHT);