nielsr HF Staff commited on
Commit
d0cf2c6
ยท
verified ยท
1 Parent(s): 6258bd7

Add pipeline tag, license metadata and improve model card

Browse files

Hi, I'm Niels, part of the community science team at Hugging Face.

I've opened this PR to improve the model card for LingBot-Map. Specifically, I've:
- Added `pipeline_tag: image-to-3d` and `license: apache-2.0` to the metadata for better discoverability.
- Updated the content to include detailed installation and usage instructions from your GitHub repository.
- Linked the model card to the corresponding paper on the Hugging Face Hub.

These changes help researchers more easily find, understand, and cite your work.

Files changed (1) hide show
  1. README.md +13 -52
README.md CHANGED
@@ -1,3 +1,8 @@
 
 
 
 
 
1
  <div align="center">
2
  <img src="assets/teaser.png" width="100%">
3
 
@@ -9,7 +14,7 @@ Robbyant Team
9
 
10
  <div align="center">
11
 
12
- [![Paper](https://img.shields.io/static/v1?label=Paper&message=arXiv&color=red&logo=arxiv)](https://arxiv.org/abs/2604.14141)
13
  [![PDF](https://img.shields.io/static/v1?label=Paper&message=PDF&color=red&logo=adobeacrobatreader)](lingbot-map_paper.pdf)
14
  [![Project](https://img.shields.io/badge/Project-Website-blue)](https://technology.robbyant.com/lingbot-map)
15
  [![HuggingFace](https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Model&message=HuggingFace&color=orange)](https://huggingface.co/robbyant/lingbot-map)
@@ -24,8 +29,9 @@ https://github.com/user-attachments/assets/fe39e095-af2c-4ec9-b68d-a8ba97e505ab
24
 
25
  ### ๐Ÿ—บ๏ธ Meet LingBot-Map! We've built a feed-forward 3D foundation model for streaming 3D reconstruction! ๐Ÿ—๏ธ๐ŸŒ
26
 
27
- LingBot-Map has focused on:
28
 
 
29
  - **Geometric Context Transformer**: Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory.
30
  - **High-Efficiency Streaming Inference**: A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518ร—378 resolution over long sequences exceeding 10,000 frames.
31
  - **State-of-the-Art Reconstruction**: Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches.
@@ -49,8 +55,6 @@ conda activate lingbot-map
49
  pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128
50
  ```
51
 
52
- > For other CUDA versions, see [PyTorch Get Started](https://pytorch.org/get-started/locally/).
53
-
54
  **3. Install lingbot-map**
55
 
56
  ```bash
@@ -66,21 +70,6 @@ FlashInfer provides paged KV cache attention for efficient streaming inference:
66
  pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
67
  ```
68
 
69
- > For other CUDA/PyTorch combinations, see [FlashInfer installation](https://docs.flashinfer.ai/installation.html).
70
- > If FlashInfer is not installed, the model falls back to SDPA (PyTorch native attention) via `--use_sdpa`.
71
-
72
- **5. Visualization dependencies (optional)**
73
-
74
- ```bash
75
- pip install -e ".[vis]"
76
- ```
77
-
78
- # ๐Ÿ“ฆ Model Download
79
-
80
- | Model Name | Huggingface Repository | ModelScope Repository | Description |
81
- | :--- | :--- | :--- | :--- |
82
- | lingbot-map | [robbyant/lingbot-map](https://huggingface.co/robbyant/lingbot-map) | [Robbyant/lingbot-map](https://www.modelscope.cn/models/Robbyant/lingbot-map) | Base model checkpoint (4.63 GB) |
83
-
84
  # ๐ŸŽฌ Demo
85
 
86
  ### Streaming Inference from Images
@@ -99,37 +88,23 @@ python demo.py --model_path /path/to/checkpoint.pt \
99
 
100
  ### Streaming with Keyframe Interval
101
 
102
- Use `--keyframe_interval` to reduce KV cache memory by only keeping every N-th frame as a keyframe. Non-keyframe frames still produce predictions but are not stored in the cache. This is useful for long sequences
103
- which excesses 320 frames.
104
 
105
  ```bash
106
  python demo.py --model_path /path/to/checkpoint.pt \
107
  --image_folder /path/to/images/ --keyframe_interval 6
108
  ```
109
 
110
- ### Windowed Inference (for long sequences, >3000 frames)
111
- ```bash
112
- python demo.py --model_path /path/to/checkpoint.pt \
113
- --video_path video.mp4 --fps 10 \
114
- --mode windowed --window_size 64
115
- ```
116
-
117
-
118
  ### Sky Masking
119
 
120
- Sky masking uses an ONNX sky segmentation model to filter out sky points from the reconstructed point cloud, which improves visualization quality for outdoor scenes.
121
 
122
  **Setup:**
123
 
124
  ```bash
125
- # Install onnxruntime (required)
126
- pip install onnxruntime # CPU
127
- # or
128
- pip install onnxruntime-gpu # GPU (faster for large image sets)
129
  ```
130
 
131
- The sky segmentation model (`skyseg.onnx`) will be automatically downloaded from [HuggingFace](https://huggingface.co/JianyuanWang/skyseg/resolve/main/skyseg.onnx) on first use.
132
-
133
  **Usage:**
134
 
135
  ```bash
@@ -137,15 +112,6 @@ python demo.py --model_path /path/to/checkpoint.pt \
137
  --image_folder /path/to/images/ --mask_sky
138
  ```
139
 
140
- Sky masks are cached in `<image_folder>_sky_masks/` so subsequent runs skip regeneration.
141
-
142
- ### Without FlashInfer (SDPA fallback)
143
-
144
- ```bash
145
- python demo.py --model_path /path/to/checkpoint.pt \
146
- --image_folder /path/to/images/ --use_sdpa
147
- ```
148
-
149
  # ๐Ÿ“œ License
150
 
151
  This project is released under the Apache License 2.0. See [LICENSE](LICENSE.txt) file for details.
@@ -163,12 +129,7 @@ This project is released under the Apache License 2.0. See [LICENSE](LICENSE.txt
163
 
164
  # โœจ Acknowledgments
165
 
166
- We thank Shangzhan Zhang, Jianyuan Wang, Yudong Jin, Christian Rupprecht, and Xun Cao for their helpful discussions and support.
167
-
168
- This work builds upon several excellent open-source projects:
169
-
170
  - [VGGT](https://github.com/facebookresearch/vggt)
171
  - [DINOv2](https://github.com/facebookresearch/dinov2)
172
- - [Flashinfer](https://github.com/flashinfer-ai/flashinfer)
173
-
174
- ---
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-to-3d
4
+ ---
5
+
6
  <div align="center">
7
  <img src="assets/teaser.png" width="100%">
8
 
 
14
 
15
  <div align="center">
16
 
17
+ [![Paper](https://img.shields.io/static/v1?label=Paper&message=arXiv&color=red&logo=arxiv)](https://huggingface.co/papers/2604.14141)
18
  [![PDF](https://img.shields.io/static/v1?label=Paper&message=PDF&color=red&logo=adobeacrobatreader)](lingbot-map_paper.pdf)
19
  [![Project](https://img.shields.io/badge/Project-Website-blue)](https://technology.robbyant.com/lingbot-map)
20
  [![HuggingFace](https://img.shields.io/static/v1?label=%F0%9F%A4%97%20Model&message=HuggingFace&color=orange)](https://huggingface.co/robbyant/lingbot-map)
 
29
 
30
  ### ๐Ÿ—บ๏ธ Meet LingBot-Map! We've built a feed-forward 3D foundation model for streaming 3D reconstruction! ๐Ÿ—๏ธ๐ŸŒ
31
 
32
+ LingBot-Map is a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture.
33
 
34
+ Key features include:
35
  - **Geometric Context Transformer**: Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory.
36
  - **High-Efficiency Streaming Inference**: A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518ร—378 resolution over long sequences exceeding 10,000 frames.
37
  - **State-of-the-Art Reconstruction**: Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches.
 
55
  pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128
56
  ```
57
 
 
 
58
  **3. Install lingbot-map**
59
 
60
  ```bash
 
70
  pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
71
  ```
72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  # ๐ŸŽฌ Demo
74
 
75
  ### Streaming Inference from Images
 
88
 
89
  ### Streaming with Keyframe Interval
90
 
91
+ Use `--keyframe_interval` to reduce KV cache memory by only keeping every N-th frame as a keyframe.
 
92
 
93
  ```bash
94
  python demo.py --model_path /path/to/checkpoint.pt \
95
  --image_folder /path/to/images/ --keyframe_interval 6
96
  ```
97
 
 
 
 
 
 
 
 
 
98
  ### Sky Masking
99
 
100
+ Sky masking filters out sky points from the reconstructed point cloud.
101
 
102
  **Setup:**
103
 
104
  ```bash
105
+ pip install onnxruntime
 
 
 
106
  ```
107
 
 
 
108
  **Usage:**
109
 
110
  ```bash
 
112
  --image_folder /path/to/images/ --mask_sky
113
  ```
114
 
 
 
 
 
 
 
 
 
 
115
  # ๐Ÿ“œ License
116
 
117
  This project is released under the Apache License 2.0. See [LICENSE](LICENSE.txt) file for details.
 
129
 
130
  # โœจ Acknowledgments
131
 
132
+ This work builds upon several open-source projects:
 
 
 
133
  - [VGGT](https://github.com/facebookresearch/vggt)
134
  - [DINOv2](https://github.com/facebookresearch/dinov2)
135
+ - [Flashinfer](https://github.com/flashinfer-ai/flashinfer)