Image Feature Extraction

Update model card with paper link and sample usage snippets

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +35 -3
README.md CHANGED
@@ -2,13 +2,14 @@
2
  license: apache-2.0
3
  pipeline_tag: image-feature-extraction
4
  ---
 
5
  # DeFM: Learning Foundation Representations from Depth for Robotics
6
 
7
  <div align="center">
8
 
9
  [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-green.svg?style=for-the-badge)](https://opensource.org/licenses/Apache-2.0)
10
  [![GitHub](https://img.shields.io/badge/GitHub-Code-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/leggedrobotics/defm)
11
- [![Arxiv](https://img.shields.io/badge/arXiv-TODO-B31B1B.svg?style=for-the-badge)](TODO-link)
12
  [![Webpage](https://img.shields.io/badge/Webpage-de--fm.github.io/-yellow.svg?style=for-the-badge&logo=google-chrome&logoColor=white)](https://de-fm.github.io/)
13
  </div>
14
 
@@ -25,9 +26,40 @@ TL;DR - A DINO-style encoder, but for depth image inputs.
25
  - **Compact efficient models**: We distill our DeFM-ViT-L into a family of smaller efficient CNNs as small as 3M params for robot policy learning.
26
  - **Robotics Proven**: Our encoder is proven effective for diverse robotic tasks such as navigation, manipulation and locomotion without task-specific fine-tuning.
27
 
28
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- Visit our [github repo](https://github.com/leggedrobotics/defm) for details on how to use the models.
31
 
32
  ## 📊 Model Zoo
33
 
 
2
  license: apache-2.0
3
  pipeline_tag: image-feature-extraction
4
  ---
5
+
6
  # DeFM: Learning Foundation Representations from Depth for Robotics
7
 
8
  <div align="center">
9
 
10
  [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-green.svg?style=for-the-badge)](https://opensource.org/licenses/Apache-2.0)
11
  [![GitHub](https://img.shields.io/badge/GitHub-Code-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/leggedrobotics/defm)
12
+ [![Arxiv](https://img.shields.io/badge/arXiv-2601.18923-B31B1B.svg?style=for-the-badge)](https://arxiv.org/abs/2601.18923)
13
  [![Webpage](https://img.shields.io/badge/Webpage-de--fm.github.io/-yellow.svg?style=for-the-badge&logo=google-chrome&logoColor=white)](https://de-fm.github.io/)
14
  </div>
15
 
 
26
  - **Compact efficient models**: We distill our DeFM-ViT-L into a family of smaller efficient CNNs as small as 3M params for robot policy learning.
27
  - **Robotics Proven**: Our encoder is proven effective for diverse robotic tasks such as navigation, manipulation and locomotion without task-specific fine-tuning.
28
 
29
+ ## 🚀 Usage
30
+
31
+ ### 1. Loading the Model
32
+ Load via **TorchHub** for easy integration:
33
+
34
+ ```python
35
+ import torch
36
+
37
+ # Load the 307M Parameter Foundation Model
38
+ model = torch.hub.load('leggedrobotics/defm:main', 'defm_vit_l14', pretrained=True)
39
+ model.eval().to("cuda")
40
+ ```
41
+
42
+ ### 2. Preprocessing
43
+ DeFM requires depth maps to be processed into our metric-aware 3-channel format.
44
+
45
+ ```python
46
+ from defm import preprocess_depth_image
47
+
48
+ # Depth needs to be in meters (numpy array, tensor or PIL image)
49
+ normalized_depth = preprocess_depth_image(metric_depth, target_size=518, patch_size=14)
50
+ ```
51
+
52
+ ### 3. Inference
53
+ ```python
54
+ with torch.no_grad():
55
+ output = model.get_intermediate_layers(
56
+ normalized_depth, n=1, reshape=True, return_class_token=True)
57
+
58
+ spatial_tokens = output[0][0] # (B, C, H', W')
59
+ class_token = output[0][1] # (B, C)
60
+ ```
61
 
62
+ For more details, visit our [GitHub repository](https://github.com/leggedrobotics/defm).
63
 
64
  ## 📊 Model Zoo
65