Update model card with paper link and sample usage snippets
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -2,13 +2,14 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
pipeline_tag: image-feature-extraction
|
| 4 |
---
|
|
|
|
| 5 |
# DeFM: Learning Foundation Representations from Depth for Robotics
|
| 6 |
|
| 7 |
<div align="center">
|
| 8 |
|
| 9 |
[](https://opensource.org/licenses/Apache-2.0)
|
| 10 |
[](https://github.com/leggedrobotics/defm)
|
| 11 |
-
[](https://de-fm.github.io/)
|
| 13 |
</div>
|
| 14 |
|
|
@@ -25,9 +26,40 @@ TL;DR - A DINO-style encoder, but for depth image inputs.
|
|
| 25 |
- **Compact efficient models**: We distill our DeFM-ViT-L into a family of smaller efficient CNNs as small as 3M params for robot policy learning.
|
| 26 |
- **Robotics Proven**: Our encoder is proven effective for diverse robotic tasks such as navigation, manipulation and locomotion without task-specific fine-tuning.
|
| 27 |
|
| 28 |
-
## Usage
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
-
|
| 31 |
|
| 32 |
## 📊 Model Zoo
|
| 33 |
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
pipeline_tag: image-feature-extraction
|
| 4 |
---
|
| 5 |
+
|
| 6 |
# DeFM: Learning Foundation Representations from Depth for Robotics
|
| 7 |
|
| 8 |
<div align="center">
|
| 9 |
|
| 10 |
[](https://opensource.org/licenses/Apache-2.0)
|
| 11 |
[](https://github.com/leggedrobotics/defm)
|
| 12 |
+
[](https://arxiv.org/abs/2601.18923)
|
| 13 |
[](https://de-fm.github.io/)
|
| 14 |
</div>
|
| 15 |
|
|
|
|
| 26 |
- **Compact efficient models**: We distill our DeFM-ViT-L into a family of smaller efficient CNNs as small as 3M params for robot policy learning.
|
| 27 |
- **Robotics Proven**: Our encoder is proven effective for diverse robotic tasks such as navigation, manipulation and locomotion without task-specific fine-tuning.
|
| 28 |
|
| 29 |
+
## 🚀 Usage
|
| 30 |
+
|
| 31 |
+
### 1. Loading the Model
|
| 32 |
+
Load via **TorchHub** for easy integration:
|
| 33 |
+
|
| 34 |
+
```python
|
| 35 |
+
import torch
|
| 36 |
+
|
| 37 |
+
# Load the 307M Parameter Foundation Model
|
| 38 |
+
model = torch.hub.load('leggedrobotics/defm:main', 'defm_vit_l14', pretrained=True)
|
| 39 |
+
model.eval().to("cuda")
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
### 2. Preprocessing
|
| 43 |
+
DeFM requires depth maps to be processed into our metric-aware 3-channel format.
|
| 44 |
+
|
| 45 |
+
```python
|
| 46 |
+
from defm import preprocess_depth_image
|
| 47 |
+
|
| 48 |
+
# Depth needs to be in meters (numpy array, tensor or PIL image)
|
| 49 |
+
normalized_depth = preprocess_depth_image(metric_depth, target_size=518, patch_size=14)
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
### 3. Inference
|
| 53 |
+
```python
|
| 54 |
+
with torch.no_grad():
|
| 55 |
+
output = model.get_intermediate_layers(
|
| 56 |
+
normalized_depth, n=1, reshape=True, return_class_token=True)
|
| 57 |
+
|
| 58 |
+
spatial_tokens = output[0][0] # (B, C, H', W')
|
| 59 |
+
class_token = output[0][1] # (B, C)
|
| 60 |
+
```
|
| 61 |
|
| 62 |
+
For more details, visit our [GitHub repository](https://github.com/leggedrobotics/defm).
|
| 63 |
|
| 64 |
## 📊 Model Zoo
|
| 65 |
|