{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ๐ŸŒŠ Depth Anything 3 โ€” From Images to 3D in Seconds\n", "\n", "
\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Aedelon/awesome-depth-anything-3/blob/main/notebooks/da3_tutorial.ipynb)\n", "[![GitHub Stars](https://img.shields.io/github/stars/Aedelon/awesome-depth-anything-3?style=social)](https://github.com/Aedelon/awesome-depth-anything-3)\n", "[![PyPI](https://img.shields.io/pypi/v/awesome-depth-anything-3)](https://pypi.org/project/awesome-depth-anything-3/)\n", "[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n", "\n", "**State-of-the-art monocular depth estimation + 3D reconstruction**\n", "\n", "
\n", "\n", "---\n", "\n", "### What you'll get:\n", "\n", "| Input | Output |\n", "|-------|--------|\n", "| ๐Ÿ“ธ Single image | ๐ŸŒŠ Metric depth map |\n", "| ๐ŸŽฌ Video / Multi-view | โ˜๏ธ 3D Point Cloud + Camera poses |\n", "| ๐Ÿ–ผ๏ธ Any scene | ๐Ÿ“ฆ Downloadable GLB file |\n", "\n", "---\n", "\n", "### โšก Quick Start\n", "\n", "1. **Runtime โ†’ Change runtime type โ†’ T4 GPU** (free tier works!)\n", "2. **Run all cells** (Ctrl+F9) or click โ–ถ๏ธ on each cell\n", "3. **Upload your images** in Section 4\n", "4. **Download your 3D model** (.glb file)\n", "\n", "โฑ๏ธ **Total time: ~5 minutes** (including model download)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title ๐Ÿš€ **1. Install** (run this first!) { display-mode: \"form\" }\n", "#@markdown > โฑ๏ธ Takes ~2 minutes on first run\n", "\n", "%%capture\n", "!pip install awesome-depth-anything-3\n", "\n", "# Verify installation\n", "import torch\n", "from IPython.display import HTML, display\n", "\n", "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", "gpu_name = torch.cuda.get_device_name(0) if device == \"cuda\" else \"None\"\n", "vram = torch.cuda.get_device_properties(0).total_memory / 1e9 if device == \"cuda\" else 0\n", "\n", "if device == \"cuda\":\n", " status = f'''\n", "
\n", "

โœ… Ready to go!

\n", "

GPU: {gpu_name}

\n", "

VRAM: {vram:.1f} GB

\n", "

PyTorch: {torch.__version__}

\n", "
\n", " '''\n", "else:\n", " status = '''\n", "
\n", "

โš ๏ธ No GPU detected!

\n", "

Go to Runtime โ†’ Change runtime type โ†’ GPU

\n", "

Then restart the notebook.

\n", "
\n", " '''\n", "\n", "display(HTML(status))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title ๐Ÿง  **2. Load Model** { display-mode: \"form\" }\n", "#@markdown Choose model size:\n", "model_size = \"DA3-LARGE\" #@param [\"DA3-SMALL\", \"DA3-BASE\", \"DA3-LARGE\", \"DA3-GIANT\", \"DA3NESTED-GIANT-LARGE\"]\n", "#@markdown ---\n", "#@markdown | Model | Speed | Quality | VRAM |\n", "#@markdown |-------|-------|---------|------|\n", "#@markdown | SMALL | โšกโšกโšก | โ˜…โ˜…โ˜† | 4GB |\n", "#@markdown | BASE | โšกโšก | โ˜…โ˜…โ˜… | 6GB |\n", "#@markdown | LARGE | โšก | โ˜…โ˜…โ˜…โ˜… | 8GB |\n", "#@markdown | GIANT | ๐Ÿข | โ˜…โ˜…โ˜…โ˜…โ˜… | 12GB |\n", "#@markdown | NESTED | ๐Ÿข | โ˜…โ˜…โ˜…โ˜…โ˜…+ | 16GB |\n", "\n", "from depth_anything_3.api import DepthAnything3\n", "import time\n", "\n", "print(f\"๐Ÿ“ฅ Loading {model_size}...\")\n", "start = time.time()\n", "\n", "model = DepthAnything3.from_pretrained(f\"depth-anything/{model_size}\")\n", "model = model.to(device).eval()\n", "\n", "print(f\"โœ… Model loaded in {time.time()-start:.1f}s\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title ๐Ÿ–ผ๏ธ **3. Try with Sample Image** { display-mode: \"form\" }\n", "#@markdown Run depth estimation on a sample image\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from PIL import Image\n", "import urllib.request\n", "import os\n", "\n", "# Download sample\n", "os.makedirs(\"samples\", exist_ok=True)\n", "url = \"https://images.unsplash.com/photo-1506905925346-21bda4d32df4?w=1280\"\n", "urllib.request.urlretrieve(url, \"samples/mountain.jpg\")\n", "\n", "# Run inference\n", "result = model.inference([\"samples/mountain.jpg\"])\n", "\n", "# Visualize\n", "fig, axes = plt.subplots(1, 2, figsize=(14, 5))\n", "\n", "axes[0].imshow(result.processed_images[0])\n", "axes[0].set_title(\"๐Ÿ“ธ Input\", fontsize=14, fontweight='bold')\n", "axes[0].axis(\"off\")\n", "\n", "depth = result.depth[0]\n", "im = axes[1].imshow(depth, cmap='Spectral_r')\n", "axes[1].set_title(f\"๐ŸŒŠ Depth (range: {depth.min():.1f}m - {depth.max():.1f}m)\", fontsize=14, fontweight='bold')\n", "axes[1].axis(\"off\")\n", "plt.colorbar(im, ax=axes[1], fraction=0.046, pad=0.04, label='Depth (m)')\n", "\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(f\"\\n๐Ÿ“Š Output shapes:\")\n", "print(f\" Depth: {result.depth.shape}\")\n", "print(f\" Confidence: {result.conf.shape}\")\n", "print(f\" Camera intrinsics: {result.intrinsics.shape}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## ๐Ÿ“ค 4. Use Your Own Images\n", "\n", "Upload your images and get a 3D point cloud!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title ๐Ÿ“ **Upload Images** { display-mode: \"form\" }\n", "#@markdown Upload **2-50 images** of the same scene from different angles.\n", "#@markdown \n", "#@markdown ๐Ÿ’ก **Tips for best results:**\n", "#@markdown - Move the camera, not the objects\n", "#@markdown - 30-50% overlap between consecutive images\n", "#@markdown - Avoid motion blur\n", "#@markdown - Good lighting helps!\n", "\n", "from google.colab import files\n", "import shutil\n", "\n", "# Clean up previous uploads\n", "upload_dir = \"my_images\"\n", "if os.path.exists(upload_dir):\n", " shutil.rmtree(upload_dir)\n", "os.makedirs(upload_dir, exist_ok=True)\n", "\n", "print(\"๐Ÿ“ค Select your images...\")\n", "uploaded = files.upload()\n", "\n", "# Save uploaded files\n", "for filename, data in uploaded.items():\n", " with open(f\"{upload_dir}/{filename}\", 'wb') as f:\n", " f.write(data)\n", "\n", "image_files = sorted([f\"{upload_dir}/{f}\" for f in os.listdir(upload_dir) \n", " if f.lower().endswith(('.jpg', '.jpeg', '.png', '.webp'))])\n", "\n", "print(f\"\\nโœ… Uploaded {len(image_files)} images\")\n", "\n", "# Preview\n", "n_preview = min(6, len(image_files))\n", "fig, axes = plt.subplots(1, n_preview, figsize=(3*n_preview, 3))\n", "if n_preview == 1:\n", " axes = [axes]\n", "for i, img_path in enumerate(image_files[:n_preview]):\n", " img = Image.open(img_path)\n", " axes[i].imshow(img)\n", " axes[i].set_title(f\"#{i+1}\", fontsize=10)\n", " axes[i].axis(\"off\")\n", "if len(image_files) > n_preview:\n", " print(f\" (showing first {n_preview} of {len(image_files)})\")\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title โšก **Run 3D Reconstruction** { display-mode: \"form\" }\n", "#@markdown This will:\n", "#@markdown 1. Estimate depth for each image\n", "#@markdown 2. Compute camera poses\n", "#@markdown 3. Generate a 3D point cloud\n", "#@markdown 4. Export to GLB format\n", "\n", "from depth_anything_3.utils.export.glb import export_to_glb\n", "import time\n", "\n", "print(f\"๐Ÿ”„ Processing {len(image_files)} images...\")\n", "start = time.time()\n", "\n", "# Run inference\n", "result = model.inference(\n", " image_files,\n", " process_res_method=\"upper_bound_resize\",\n", ")\n", "\n", "inference_time = time.time() - start\n", "print(f\"โœ… Inference done in {inference_time:.1f}s ({len(image_files)/inference_time:.1f} img/s)\")\n", "\n", "# Export to GLB\n", "output_dir = \"output_3d\"\n", "os.makedirs(output_dir, exist_ok=True)\n", "\n", "print(\"๐Ÿ“ฆ Generating 3D point cloud...\")\n", "export_to_glb(\n", " result,\n", " export_dir=output_dir,\n", " show_cameras=True,\n", " conf_thresh_percentile=20, # Filter low-confidence points\n", " num_max_points=500_000,\n", ")\n", "\n", "print(f\"\\nโœ… 3D model saved to {output_dir}/\")\n", "!ls -lh {output_dir}/" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title ๐Ÿ“ฅ **Download Your 3D Model** { display-mode: \"form\" }\n", "#@markdown Downloads a `.glb` file you can view in:\n", "#@markdown - [glTF Viewer](https://gltf-viewer.donmccurdy.com/)\n", "#@markdown - Blender\n", "#@markdown - Windows 3D Viewer\n", "#@markdown - Any 3D software\n", "\n", "from google.colab import files\n", "\n", "glb_file = f\"{output_dir}/point_cloud.glb\"\n", "if os.path.exists(glb_file):\n", " files.download(glb_file)\n", " print(\"\\n๐ŸŽ‰ Download started!\")\n", " print(\"\\n๐Ÿ‘‰ View your model: https://gltf-viewer.donmccurdy.com/\")\n", "else:\n", " print(\"โŒ GLB file not found. Run the previous cell first.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## ๐Ÿ“Š 5. Visualize Results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title ๐ŸŒŠ **View All Depth Maps** { display-mode: \"form\" }\n", "\n", "n_images = len(result.depth)\n", "cols = min(4, n_images)\n", "rows = (n_images + cols - 1) // cols\n", "\n", "fig, axes = plt.subplots(rows, cols, figsize=(4*cols, 4*rows))\n", "axes = np.array(axes).flatten() if n_images > 1 else [axes]\n", "\n", "for i in range(n_images):\n", " depth = result.depth[i]\n", " axes[i].imshow(depth, cmap='Spectral_r')\n", " axes[i].set_title(f\"Frame {i+1}\", fontsize=10)\n", " axes[i].axis(\"off\")\n", "\n", "# Hide unused subplots\n", "for i in range(n_images, len(axes)):\n", " axes[i].axis(\"off\")\n", "\n", "plt.suptitle(\"๐ŸŒŠ Depth Maps\", fontsize=16, fontweight='bold')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title ๐Ÿ“ท **View Camera Poses** { display-mode: \"form\" }\n", "#@markdown Visualize estimated camera positions in 3D\n", "\n", "from mpl_toolkits.mplot3d import Axes3D\n", "\n", "# Extract camera positions from extrinsics\n", "positions = []\n", "for ext in result.extrinsics:\n", " # Extrinsic is world-to-camera, invert to get camera-to-world\n", " R = ext[:3, :3]\n", " t = ext[:3, 3]\n", " cam_pos = -R.T @ t # Camera position in world coordinates\n", " positions.append(cam_pos)\n", "\n", "positions = np.array(positions)\n", "\n", "fig = plt.figure(figsize=(10, 8))\n", "ax = fig.add_subplot(111, projection='3d')\n", "\n", "# Plot camera positions\n", "ax.scatter(positions[:, 0], positions[:, 1], positions[:, 2], \n", " c=range(len(positions)), cmap='viridis', s=100, marker='o')\n", "\n", "# Connect cameras with lines\n", "ax.plot(positions[:, 0], positions[:, 1], positions[:, 2], \n", " 'b-', alpha=0.5, linewidth=1)\n", "\n", "# Mark first and last\n", "ax.scatter(*positions[0], c='green', s=200, marker='^', label='First')\n", "ax.scatter(*positions[-1], c='red', s=200, marker='v', label='Last')\n", "\n", "ax.set_xlabel('X')\n", "ax.set_ylabel('Y')\n", "ax.set_zlabel('Z')\n", "ax.set_title('๐Ÿ“ท Camera Trajectory', fontsize=14, fontweight='bold')\n", "ax.legend()\n", "\n", "plt.tight_layout()\n", "plt.show()\n", "\n", "print(f\"๐Ÿ“ {len(positions)} camera poses estimated\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## ๐ŸŽฌ 6. Process Video" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title ๐ŸŽฌ **Upload Video** { display-mode: \"form\" }\n", "#@markdown Upload a short video (< 30 seconds recommended)\n", "\n", "fps_extract = 2 #@param {type:\"slider\", min:1, max:10, step:1}\n", "#@markdown โ†‘ Frames per second to extract (lower = faster, higher = more detail)\n", "\n", "from google.colab import files\n", "import subprocess\n", "\n", "print(\"๐Ÿ“ค Select a video file...\")\n", "uploaded = files.upload()\n", "\n", "video_file = list(uploaded.keys())[0]\n", "frames_dir = \"video_frames\"\n", "\n", "# Extract frames\n", "if os.path.exists(frames_dir):\n", " shutil.rmtree(frames_dir)\n", "os.makedirs(frames_dir, exist_ok=True)\n", "\n", "print(f\"๐ŸŽž๏ธ Extracting frames at {fps_extract} FPS...\")\n", "subprocess.run([\n", " \"ffmpeg\", \"-i\", video_file, \n", " \"-vf\", f\"fps={fps_extract}\",\n", " f\"{frames_dir}/frame_%04d.jpg\",\n", " \"-hide_banner\", \"-loglevel\", \"error\"\n", "])\n", "\n", "video_images = sorted([f\"{frames_dir}/{f}\" for f in os.listdir(frames_dir)])\n", "print(f\"โœ… Extracted {len(video_images)} frames\")\n", "\n", "# Preview\n", "n_preview = min(8, len(video_images))\n", "fig, axes = plt.subplots(1, n_preview, figsize=(2*n_preview, 2))\n", "step = max(1, len(video_images) // n_preview)\n", "for i, ax in enumerate(axes):\n", " idx = i * step\n", " if idx < len(video_images):\n", " ax.imshow(Image.open(video_images[idx]))\n", " ax.axis(\"off\")\n", "plt.suptitle(f\"๐ŸŽฌ Video Frames ({len(video_images)} total)\", fontsize=12)\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title โšก **Process Video Frames** { display-mode: \"form\" }\n", "\n", "print(f\"๐Ÿ”„ Processing {len(video_images)} frames...\")\n", "start = time.time()\n", "\n", "result_video = model.inference(\n", " video_images,\n", " process_res_method=\"upper_bound_resize\",\n", ")\n", "\n", "elapsed = time.time() - start\n", "print(f\"โœ… Done in {elapsed:.1f}s ({len(video_images)/elapsed:.1f} FPS)\")\n", "\n", "# Export\n", "video_output = \"video_3d\"\n", "os.makedirs(video_output, exist_ok=True)\n", "\n", "export_to_glb(\n", " result_video,\n", " export_dir=video_output,\n", " show_cameras=True,\n", " conf_thresh_percentile=15,\n", " num_max_points=1_000_000,\n", ")\n", "\n", "print(f\"\\n๐Ÿ“ฆ 3D model saved!\")\n", "!ls -lh {video_output}/" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title ๐Ÿ“ฅ **Download Video 3D Model** { display-mode: \"form\" }\n", "\n", "glb_file = f\"{video_output}/point_cloud.glb\"\n", "if os.path.exists(glb_file):\n", " files.download(glb_file)\n", " print(\"๐ŸŽ‰ Download started!\")\n", "else:\n", " print(\"โŒ Run the previous cell first.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## ๐Ÿ”ง 7. Advanced: Python API" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title ๐Ÿ’ป **API Reference** { display-mode: \"form\" }\n", "#@markdown Quick code snippets for common tasks\n", "\n", "from IPython.display import Markdown\n", "\n", "api_docs = '''\n", "### Basic Usage\n", "\n", "```python\n", "from depth_anything_3.api import DepthAnything3\n", "\n", "# Load model\n", "model = DepthAnything3.from_pretrained(\"depth-anything/DA3-LARGE\")\n", "model = model.to(\"cuda\").eval()\n", "\n", "# Single image\n", "result = model.inference([\"image.jpg\"])\n", "depth = result.depth[0] # Shape: (H, W)\n", "\n", "# Multiple images\n", "result = model.inference([\"img1.jpg\", \"img2.jpg\", \"img3.jpg\"])\n", "depths = result.depth # Shape: (N, H, W)\n", "```\n", "\n", "### Output Attributes\n", "\n", "| Attribute | Shape | Description |\n", "|-----------|-------|-------------|\n", "| `depth` | `(N, H, W)` | Metric depth in meters |\n", "| `conf` | `(N, H, W)` | Confidence [0-1] |\n", "| `extrinsics` | `(N, 3, 4)` | Camera poses (world-to-cam) |\n", "| `intrinsics` | `(N, 3, 3)` | Camera K matrix |\n", "| `processed_images` | `(N, H, W, 3)` | Resized inputs (uint8) |\n", "\n", "### Export to 3D\n", "\n", "```python\n", "from depth_anything_3.utils.export.glb import export_to_glb\n", "\n", "export_to_glb(\n", " result,\n", " export_dir=\"output\",\n", " show_cameras=True, # Show camera frustums\n", " conf_thresh_percentile=20, # Filter low confidence\n", " num_max_points=500_000, # Max points in cloud\n", ")\n", "```\n", "\n", "### CLI Usage\n", "\n", "```bash\n", "# Single image\n", "da3 infer image.jpg -o output/\n", "\n", "# Directory of images\n", "da3 infer images/ -o output/ --model DA3-LARGE\n", "\n", "# Video\n", "da3 infer video.mp4 -o output/ --fps 2\n", "```\n", "'''\n", "\n", "display(Markdown(api_docs))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## ๐Ÿ’พ 8. Save to Google Drive" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title ๐Ÿ’พ **Mount Google Drive** { display-mode: \"form\" }\n", "\n", "from google.colab import drive\n", "drive.mount('/content/drive')\n", "\n", "drive_output = \"/content/drive/MyDrive/DepthAnything3_Results\"\n", "os.makedirs(drive_output, exist_ok=True)\n", "print(f\"โœ… Drive mounted at: {drive_output}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#@title ๐Ÿ’พ **Save Results to Drive** { display-mode: \"form\" }\n", "\n", "import shutil\n", "from datetime import datetime\n", "\n", "# Create timestamped folder\n", "timestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\n", "save_dir = f\"{drive_output}/{timestamp}\"\n", "os.makedirs(save_dir, exist_ok=True)\n", "\n", "# Copy all outputs\n", "for folder in [\"output_3d\", \"video_3d\"]:\n", " if os.path.exists(folder):\n", " for f in os.listdir(folder):\n", " shutil.copy(f\"{folder}/{f}\", save_dir)\n", " print(f\" โœ“ {f}\")\n", "\n", "print(f\"\\nโœ… Saved to: {save_dir}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## ๐Ÿ™ Credits & Links\n", "\n", "
\n", "\n", "**Depth Anything 3** by ByteDance Research\n", "\n", "[๐Ÿ“„ Paper](https://arxiv.org/abs/2511.10647) โ€ข [๐ŸŒ Project](https://depth-anything-3.github.io) โ€ข [๐Ÿค— Models](https://huggingface.co/collections/depth-anything/depth-anything-3)\n", "\n", "---\n", "\n", "**awesome-depth-anything-3** โ€” Optimized fork with batching, caching & CLI\n", "\n", "[โญ GitHub](https://github.com/Aedelon/awesome-depth-anything-3) โ€ข [๐Ÿ“ฆ PyPI](https://pypi.org/project/awesome-depth-anything-3/)\n", "\n", "---\n", "\n", "Made with โค๏ธ by [Delanoe Pirard](https://github.com/Aedelon)\n", "\n", "
" ] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "T4", "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 0 }