|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- depth-estimation |
|
|
- computer-vision |
|
|
- pytorch |
|
|
- absolute depth |
|
|
pipeline_tag: depth-estimation |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# Depth-CHM Model |
|
|
|
|
|
A fine-tuned Depth Anything V2 model for depth estimation, trained on forest canopy height data. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model is based on [Depth-Anything-V2-Metric-Indoor-Base](https://huggingface.co/depth-anything/Depth-Anything-V2-Metric-Indoor-Base-hf) and fine-tuned for estimating depth/canopy height from aerial imagery. |
|
|
|
|
|
### Training Details |
|
|
|
|
|
- **Base Model**: depth-anything/Depth-Anything-V2-Metric-Indoor-Base-hf |
|
|
- **Max Depth**: 40.0 meters |
|
|
- **Loss Function**: SiLog + 0.1 * L1 Loss |
|
|
- **Hyperparameter Tuning**: Optuna (50 trials) |
|
|
|
|
|
## Installation |
|
|
|
|
|
```bash |
|
|
pip install transformers torch pillow numpy |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Method 1: Using Pipeline (Recommended) |
|
|
|
|
|
The simplest way to use the model: |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
from PIL import Image |
|
|
import numpy as np |
|
|
|
|
|
# Load pipeline |
|
|
pipe = pipeline(task="depth-estimation", model="Boxiang/depth_chm") |
|
|
|
|
|
# Load image |
|
|
image = Image.open("your_image.png").convert("RGB") |
|
|
|
|
|
# Run inference |
|
|
result = pipe(image) |
|
|
depth_image = result["depth"] # PIL Image (normalized 0-255) |
|
|
|
|
|
# Convert to numpy array and scale to actual depth (0-40m) |
|
|
max_depth = 40.0 |
|
|
depth = np.array(depth_image).astype(np.float32) / 255.0 * max_depth |
|
|
|
|
|
print(f"Depth shape: {depth.shape}") |
|
|
print(f"Depth range: [{depth.min():.2f}, {depth.max():.2f}] meters") |
|
|
``` |
|
|
|
|
|
### Method 2: Using AutoImageProcessor + Model |
|
|
|
|
|
For more control over the inference process: |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import torch.nn.functional as F |
|
|
from transformers import AutoImageProcessor, DepthAnythingForDepthEstimation |
|
|
from PIL import Image |
|
|
import numpy as np |
|
|
|
|
|
# Configuration |
|
|
model_id = "Boxiang/depth_chm" |
|
|
max_depth = 40.0 |
|
|
|
|
|
# Load model and processor |
|
|
processor = AutoImageProcessor.from_pretrained(model_id) |
|
|
model = DepthAnythingForDepthEstimation.from_pretrained(model_id) |
|
|
|
|
|
# Use GPU if available |
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
model = model.to(device) |
|
|
model.eval() |
|
|
|
|
|
# Load and process image |
|
|
image = Image.open("your_image.png").convert("RGB") |
|
|
original_size = image.size # (width, height) |
|
|
|
|
|
# Prepare input |
|
|
inputs = processor(images=image, return_tensors="pt") |
|
|
pixel_values = inputs["pixel_values"].to(device) |
|
|
|
|
|
# Run inference |
|
|
with torch.no_grad(): |
|
|
outputs = model(pixel_values) |
|
|
predicted_depth = outputs.predicted_depth |
|
|
|
|
|
# Scale by max_depth |
|
|
pred_scaled = predicted_depth * max_depth |
|
|
|
|
|
# Resize to original image size |
|
|
depth = F.interpolate( |
|
|
pred_scaled.unsqueeze(0), |
|
|
size=(original_size[1], original_size[0]), # (height, width) |
|
|
mode="bilinear", |
|
|
align_corners=True |
|
|
).squeeze().cpu().numpy() |
|
|
|
|
|
print(f"Depth shape: {depth.shape}") |
|
|
print(f"Depth range: [{depth.min():.2f}, {depth.max():.2f}] meters") |
|
|
``` |
|
|
|
|
|
### Method 3: Local Model Path |
|
|
|
|
|
If you have the model saved locally: |
|
|
|
|
|
```python |
|
|
from transformers import AutoImageProcessor, DepthAnythingForDepthEstimation |
|
|
|
|
|
# Load from local path |
|
|
model_path = "./depth_chm_trained" |
|
|
processor = AutoImageProcessor.from_pretrained(model_path, local_files_only=True) |
|
|
model = DepthAnythingForDepthEstimation.from_pretrained(model_path, local_files_only=True) |
|
|
``` |
|
|
|
|
|
## Output Format |
|
|
|
|
|
- **Pipeline output**: Returns a PIL Image with normalized depth values (0-255). Multiply by `max_depth / 255.0` to get actual depth in meters. |
|
|
- **Model output**: Returns `predicted_depth` tensor with values in range [0, 1]. Multiply by `max_depth` (40.0) to get actual depth in meters. |
|
|
|
|
|
## Depth vs Height Conversion |
|
|
|
|
|
The model outputs **depth** (distance from camera). To convert to **height** (like CHM - Canopy Height Model): |
|
|
|
|
|
```python |
|
|
height = max_depth - depth |
|
|
``` |
|
|
|
|
|
## Model Files |
|
|
|
|
|
- `model.safetensors` - Model weights |
|
|
- `config.json` - Model configuration |
|
|
- `preprocessor_config.json` - Image processor configuration |
|
|
- `training_info.json` - Training hyperparameters |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{depth_chm_2024, |
|
|
title={Depth-CHM: Fine-tuned Depth Anything V2 for Canopy Height Estimation}, |
|
|
author={Boxiang}, |
|
|
year={2024}, |
|
|
url={https://huggingface.co/Boxiang/depth_chm} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model inherits the license from the base Depth Anything V2 model. |
|
|
|