Image Feature Extraction
nielsr HF Staff commited on
Commit
207d01b
·
verified ·
1 Parent(s): cfea7b9

Update model card with paper link and sample usage snippets

Browse files

Hi! I'm Niels from the Hugging Face community team.

This PR improves the model card for DeFM by:
- Replacing the placeholder `TODO` Arxiv link with the correct link to the paper: [DeFM: Learning Foundation Representations from Depth for Robotics](https://huggingface.co/papers/2601.18923).
- Adding a detailed **Usage** section with code snippets for model loading, preprocessing, and inference, based on the documentation in your GitHub repository.

These changes help users get started with the model directly from the Hub. Please let me know if you have any questions!

Files changed (1) hide show
  1. README.md +35 -3
README.md CHANGED
@@ -2,13 +2,14 @@
2
  license: apache-2.0
3
  pipeline_tag: image-feature-extraction
4
  ---
 
5
  # DeFM: Learning Foundation Representations from Depth for Robotics
6
 
7
  <div align="center">
8
 
9
  [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-green.svg?style=for-the-badge)](https://opensource.org/licenses/Apache-2.0)
10
  [![GitHub](https://img.shields.io/badge/GitHub-Code-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/leggedrobotics/defm)
11
- [![Arxiv](https://img.shields.io/badge/arXiv-TODO-B31B1B.svg?style=for-the-badge)](TODO-link)
12
  [![Webpage](https://img.shields.io/badge/Webpage-de--fm.github.io/-yellow.svg?style=for-the-badge&logo=google-chrome&logoColor=white)](https://de-fm.github.io/)
13
  </div>
14
 
@@ -25,9 +26,40 @@ TL;DR - A DINO-style encoder, but for depth image inputs.
25
  - **Compact efficient models**: We distill our DeFM-ViT-L into a family of smaller efficient CNNs as small as 3M params for robot policy learning.
26
  - **Robotics Proven**: Our encoder is proven effective for diverse robotic tasks such as navigation, manipulation and locomotion without task-specific fine-tuning.
27
 
28
- ## Usage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- Visit our [github repo](https://github.com/leggedrobotics/defm) for details on how to use the models.
31
 
32
  ## 📊 Model Zoo
33
 
 
2
  license: apache-2.0
3
  pipeline_tag: image-feature-extraction
4
  ---
5
+
6
  # DeFM: Learning Foundation Representations from Depth for Robotics
7
 
8
  <div align="center">
9
 
10
  [![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-green.svg?style=for-the-badge)](https://opensource.org/licenses/Apache-2.0)
11
  [![GitHub](https://img.shields.io/badge/GitHub-Code-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/leggedrobotics/defm)
12
+ [![Arxiv](https://img.shields.io/badge/arXiv-2601.18923-B31B1B.svg?style=for-the-badge)](https://arxiv.org/abs/2601.18923)
13
  [![Webpage](https://img.shields.io/badge/Webpage-de--fm.github.io/-yellow.svg?style=for-the-badge&logo=google-chrome&logoColor=white)](https://de-fm.github.io/)
14
  </div>
15
 
 
26
  - **Compact efficient models**: We distill our DeFM-ViT-L into a family of smaller efficient CNNs as small as 3M params for robot policy learning.
27
  - **Robotics Proven**: Our encoder is proven effective for diverse robotic tasks such as navigation, manipulation and locomotion without task-specific fine-tuning.
28
 
29
+ ## 🚀 Usage
30
+
31
+ ### 1. Loading the Model
32
+ Load via **TorchHub** for easy integration:
33
+
34
+ ```python
35
+ import torch
36
+
37
+ # Load the 307M Parameter Foundation Model
38
+ model = torch.hub.load('leggedrobotics/defm:main', 'defm_vit_l14', pretrained=True)
39
+ model.eval().to("cuda")
40
+ ```
41
+
42
+ ### 2. Preprocessing
43
+ DeFM requires depth maps to be processed into our metric-aware 3-channel format.
44
+
45
+ ```python
46
+ from defm import preprocess_depth_image
47
+
48
+ # Depth needs to be in meters (numpy array, tensor or PIL image)
49
+ normalized_depth = preprocess_depth_image(metric_depth, target_size=518, patch_size=14)
50
+ ```
51
+
52
+ ### 3. Inference
53
+ ```python
54
+ with torch.no_grad():
55
+ output = model.get_intermediate_layers(
56
+ normalized_depth, n=1, reshape=True, return_class_token=True)
57
+
58
+ spatial_tokens = output[0][0] # (B, C, H', W')
59
+ class_token = output[0][1] # (B, C)
60
+ ```
61
 
62
+ For more details, visit our [GitHub repository](https://github.com/leggedrobotics/defm).
63
 
64
  ## 📊 Model Zoo
65