leggedrobotics
/

defm

Image Feature Extraction

Model card Files Files and versions

xet

Community

Update model card with paper link and sample usage snippets

by nielsr HF Staff - opened Jan 29

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+35

-3

Files changed (1) hide show

README.md +35 -3

README.md CHANGED Viewed

@@ -2,13 +2,14 @@
 license: apache-2.0
 pipeline_tag: image-feature-extraction
 ---
 # DeFM: Learning Foundation Representations from Depth for Robotics
 <div align="center">
 [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-green.svg?style=for-the-badge)](https://opensource.org/licenses/Apache-2.0)
 [![GitHub](https://img.shields.io/badge/GitHub-Code-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/leggedrobotics/defm)
-[![Arxiv](https://img.shields.io/badge/arXiv-TODO-B31B1B.svg?style=for-the-badge)](TODO-link)
 [![Webpage](https://img.shields.io/badge/Webpage-de--fm.github.io/-yellow.svg?style=for-the-badge&logo=google-chrome&logoColor=white)](https://de-fm.github.io/)
 </div>
@@ -25,9 +26,40 @@ TL;DR - A DINO-style encoder, but for depth image inputs.
 - **Compact efficient models**: We distill our DeFM-ViT-L into a family of smaller efficient CNNs as small as 3M params for robot policy learning.
 - **Robotics Proven**: Our encoder is proven effective for diverse robotic tasks such as navigation, manipulation and locomotion without task-specific fine-tuning.
-## Usage
-Visit our [github repo](https://github.com/leggedrobotics/defm) for details on how to use the models.
 ## 📊 Model Zoo

 license: apache-2.0
 pipeline_tag: image-feature-extraction
 ---
 # DeFM: Learning Foundation Representations from Depth for Robotics
 <div align="center">
 [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-green.svg?style=for-the-badge)](https://opensource.org/licenses/Apache-2.0)
 [![GitHub](https://img.shields.io/badge/GitHub-Code-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/leggedrobotics/defm)
+[![Arxiv](https://img.shields.io/badge/arXiv-2601.18923-B31B1B.svg?style=for-the-badge)](https://arxiv.org/abs/2601.18923)
 [![Webpage](https://img.shields.io/badge/Webpage-de--fm.github.io/-yellow.svg?style=for-the-badge&logo=google-chrome&logoColor=white)](https://de-fm.github.io/)
 </div>
 - **Compact efficient models**: We distill our DeFM-ViT-L into a family of smaller efficient CNNs as small as 3M params for robot policy learning.
 - **Robotics Proven**: Our encoder is proven effective for diverse robotic tasks such as navigation, manipulation and locomotion without task-specific fine-tuning.
+## 🚀 Usage
+### 1. Loading the Model
+Load via **TorchHub** for easy integration:
+```python
+import torch
+# Load the 307M Parameter Foundation Model
+model = torch.hub.load('leggedrobotics/defm:main', 'defm_vit_l14', pretrained=True)
+model.eval().to("cuda")
+```
+### 2. Preprocessing
+DeFM requires depth maps to be processed into our metric-aware 3-channel format.
+```python
+from defm import preprocess_depth_image
+# Depth needs to be in meters (numpy array, tensor or PIL image)
+normalized_depth = preprocess_depth_image(metric_depth, target_size=518, patch_size=14)
+```
+### 3. Inference
+```python
+with torch.no_grad():
+    output = model.get_intermediate_layers(
+        normalized_depth, n=1, reshape=True, return_class_token=True)
+spatial_tokens = output[0][0] # (B, C, H', W')
+class_token = output[0][1] # (B, C)
+```
+For more details, visit our [GitHub repository](https://github.com/leggedrobotics/defm).
 ## 📊 Model Zoo