| | --- |
| | license: mit |
| | --- |
| | |
| | # DPT 3.1 (BEiT backbone) |
| |
|
| | DPT (Dense Prediction Transformer) model trained on 1.4 million images for monocular depth estimation. It was introduced in the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by Ranftl et al. (2021) and first released in [this repository](https://github.com/isl-org/DPT). |
| |
|
| | Disclaimer: The team releasing DPT did not write a model card for this model so this model card has been written by the Hugging Face team. |
| |
|
| | ## Model description |
| |
|
| | This DPT model uses the [BEiT](https://huggingface.co/docs/transformers/model_doc/beit) model as backbone and adds a neck + head on top for monocular depth estimation. |
| |
|
| |  |
| |
|
| | ## How to use |
| |
|
| | Here is how to use this model for zero-shot depth estimation on an image: |
| |
|
| | ```python |
| | from transformers import DPTImageProcessor, DPTForDepthEstimation |
| | import torch |
| | import numpy as np |
| | from PIL import Image |
| | import requests |
| | |
| | url = "http://images.cocodataset.org/val2017/000000039769.jpg" |
| | image = Image.open(requests.get(url, stream=True).raw) |
| | |
| | processor = DPTImageProcessor.from_pretrained("Intel/dpt-large") |
| | model = DPTForDepthEstimation.from_pretrained("Intel/dpt-large") |
| | |
| | # prepare image for the model |
| | inputs = processor(images=image, return_tensors="pt") |
| | |
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | predicted_depth = outputs.predicted_depth |
| | |
| | # interpolate to original size |
| | prediction = torch.nn.functional.interpolate( |
| | predicted_depth.unsqueeze(1), |
| | size=image.size[::-1], |
| | mode="bicubic", |
| | align_corners=False, |
| | ) |
| | |
| | # visualize the prediction |
| | output = prediction.squeeze().cpu().numpy() |
| | formatted = (output * 255 / np.max(output)).astype("uint8") |
| | depth = Image.fromarray(formatted) |
| | ``` |
| |
|
| | or one can use the pipeline API: |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | pipe = pipeline(task="depth-estimation", model="Intel/dpt-beit-base-384") |
| | result = pipe("http://images.cocodataset.org/val2017/000000039769.jpg") |
| | result["depth"] |
| | ``` |