Add pipeline tag, license metadata and improve model card
Browse filesHi, I'm Niels, part of the community science team at Hugging Face.
I've opened this PR to improve the model card for LingBot-Map. Specifically, I've:
- Added `pipeline_tag: image-to-3d` and `license: apache-2.0` to the metadata for better discoverability.
- Updated the content to include detailed installation and usage instructions from your GitHub repository.
- Linked the model card to the corresponding paper on the Hugging Face Hub.
These changes help researchers more easily find, understand, and cite your work.
README.md
CHANGED
|
@@ -1,3 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
<div align="center">
|
| 2 |
<img src="assets/teaser.png" width="100%">
|
| 3 |
|
|
@@ -9,7 +14,7 @@ Robbyant Team
|
|
| 9 |
|
| 10 |
<div align="center">
|
| 11 |
|
| 12 |
-
[](https://
|
| 13 |
[](lingbot-map_paper.pdf)
|
| 14 |
[](https://technology.robbyant.com/lingbot-map)
|
| 15 |
[](https://huggingface.co/robbyant/lingbot-map)
|
|
@@ -24,8 +29,9 @@ https://github.com/user-attachments/assets/fe39e095-af2c-4ec9-b68d-a8ba97e505ab
|
|
| 24 |
|
| 25 |
### ๐บ๏ธ Meet LingBot-Map! We've built a feed-forward 3D foundation model for streaming 3D reconstruction! ๐๏ธ๐
|
| 26 |
|
| 27 |
-
LingBot-Map
|
| 28 |
|
|
|
|
| 29 |
- **Geometric Context Transformer**: Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory.
|
| 30 |
- **High-Efficiency Streaming Inference**: A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518ร378 resolution over long sequences exceeding 10,000 frames.
|
| 31 |
- **State-of-the-Art Reconstruction**: Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches.
|
|
@@ -49,8 +55,6 @@ conda activate lingbot-map
|
|
| 49 |
pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128
|
| 50 |
```
|
| 51 |
|
| 52 |
-
> For other CUDA versions, see [PyTorch Get Started](https://pytorch.org/get-started/locally/).
|
| 53 |
-
|
| 54 |
**3. Install lingbot-map**
|
| 55 |
|
| 56 |
```bash
|
|
@@ -66,21 +70,6 @@ FlashInfer provides paged KV cache attention for efficient streaming inference:
|
|
| 66 |
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
|
| 67 |
```
|
| 68 |
|
| 69 |
-
> For other CUDA/PyTorch combinations, see [FlashInfer installation](https://docs.flashinfer.ai/installation.html).
|
| 70 |
-
> If FlashInfer is not installed, the model falls back to SDPA (PyTorch native attention) via `--use_sdpa`.
|
| 71 |
-
|
| 72 |
-
**5. Visualization dependencies (optional)**
|
| 73 |
-
|
| 74 |
-
```bash
|
| 75 |
-
pip install -e ".[vis]"
|
| 76 |
-
```
|
| 77 |
-
|
| 78 |
-
# ๐ฆ Model Download
|
| 79 |
-
|
| 80 |
-
| Model Name | Huggingface Repository | ModelScope Repository | Description |
|
| 81 |
-
| :--- | :--- | :--- | :--- |
|
| 82 |
-
| lingbot-map | [robbyant/lingbot-map](https://huggingface.co/robbyant/lingbot-map) | [Robbyant/lingbot-map](https://www.modelscope.cn/models/Robbyant/lingbot-map) | Base model checkpoint (4.63 GB) |
|
| 83 |
-
|
| 84 |
# ๐ฌ Demo
|
| 85 |
|
| 86 |
### Streaming Inference from Images
|
|
@@ -99,37 +88,23 @@ python demo.py --model_path /path/to/checkpoint.pt \
|
|
| 99 |
|
| 100 |
### Streaming with Keyframe Interval
|
| 101 |
|
| 102 |
-
Use `--keyframe_interval` to reduce KV cache memory by only keeping every N-th frame as a keyframe.
|
| 103 |
-
which excesses 320 frames.
|
| 104 |
|
| 105 |
```bash
|
| 106 |
python demo.py --model_path /path/to/checkpoint.pt \
|
| 107 |
--image_folder /path/to/images/ --keyframe_interval 6
|
| 108 |
```
|
| 109 |
|
| 110 |
-
### Windowed Inference (for long sequences, >3000 frames)
|
| 111 |
-
```bash
|
| 112 |
-
python demo.py --model_path /path/to/checkpoint.pt \
|
| 113 |
-
--video_path video.mp4 --fps 10 \
|
| 114 |
-
--mode windowed --window_size 64
|
| 115 |
-
```
|
| 116 |
-
|
| 117 |
-
|
| 118 |
### Sky Masking
|
| 119 |
|
| 120 |
-
Sky masking
|
| 121 |
|
| 122 |
**Setup:**
|
| 123 |
|
| 124 |
```bash
|
| 125 |
-
|
| 126 |
-
pip install onnxruntime # CPU
|
| 127 |
-
# or
|
| 128 |
-
pip install onnxruntime-gpu # GPU (faster for large image sets)
|
| 129 |
```
|
| 130 |
|
| 131 |
-
The sky segmentation model (`skyseg.onnx`) will be automatically downloaded from [HuggingFace](https://huggingface.co/JianyuanWang/skyseg/resolve/main/skyseg.onnx) on first use.
|
| 132 |
-
|
| 133 |
**Usage:**
|
| 134 |
|
| 135 |
```bash
|
|
@@ -137,15 +112,6 @@ python demo.py --model_path /path/to/checkpoint.pt \
|
|
| 137 |
--image_folder /path/to/images/ --mask_sky
|
| 138 |
```
|
| 139 |
|
| 140 |
-
Sky masks are cached in `<image_folder>_sky_masks/` so subsequent runs skip regeneration.
|
| 141 |
-
|
| 142 |
-
### Without FlashInfer (SDPA fallback)
|
| 143 |
-
|
| 144 |
-
```bash
|
| 145 |
-
python demo.py --model_path /path/to/checkpoint.pt \
|
| 146 |
-
--image_folder /path/to/images/ --use_sdpa
|
| 147 |
-
```
|
| 148 |
-
|
| 149 |
# ๐ License
|
| 150 |
|
| 151 |
This project is released under the Apache License 2.0. See [LICENSE](LICENSE.txt) file for details.
|
|
@@ -163,12 +129,7 @@ This project is released under the Apache License 2.0. See [LICENSE](LICENSE.txt
|
|
| 163 |
|
| 164 |
# โจ Acknowledgments
|
| 165 |
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
This work builds upon several excellent open-source projects:
|
| 169 |
-
|
| 170 |
- [VGGT](https://github.com/facebookresearch/vggt)
|
| 171 |
- [DINOv2](https://github.com/facebookresearch/dinov2)
|
| 172 |
-
- [Flashinfer](https://github.com/flashinfer-ai/flashinfer)
|
| 173 |
-
|
| 174 |
-
---
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: image-to-3d
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
<div align="center">
|
| 7 |
<img src="assets/teaser.png" width="100%">
|
| 8 |
|
|
|
|
| 14 |
|
| 15 |
<div align="center">
|
| 16 |
|
| 17 |
+
[](https://huggingface.co/papers/2604.14141)
|
| 18 |
[](lingbot-map_paper.pdf)
|
| 19 |
[](https://technology.robbyant.com/lingbot-map)
|
| 20 |
[](https://huggingface.co/robbyant/lingbot-map)
|
|
|
|
| 29 |
|
| 30 |
### ๐บ๏ธ Meet LingBot-Map! We've built a feed-forward 3D foundation model for streaming 3D reconstruction! ๐๏ธ๐
|
| 31 |
|
| 32 |
+
LingBot-Map is a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture.
|
| 33 |
|
| 34 |
+
Key features include:
|
| 35 |
- **Geometric Context Transformer**: Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory.
|
| 36 |
- **High-Efficiency Streaming Inference**: A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518ร378 resolution over long sequences exceeding 10,000 frames.
|
| 37 |
- **State-of-the-Art Reconstruction**: Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches.
|
|
|
|
| 55 |
pip install torch==2.9.1 torchvision==0.24.1 --index-url https://download.pytorch.org/whl/cu128
|
| 56 |
```
|
| 57 |
|
|
|
|
|
|
|
| 58 |
**3. Install lingbot-map**
|
| 59 |
|
| 60 |
```bash
|
|
|
|
| 70 |
pip install flashinfer-python -i https://flashinfer.ai/whl/cu128/torch2.9/
|
| 71 |
```
|
| 72 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
# ๐ฌ Demo
|
| 74 |
|
| 75 |
### Streaming Inference from Images
|
|
|
|
| 88 |
|
| 89 |
### Streaming with Keyframe Interval
|
| 90 |
|
| 91 |
+
Use `--keyframe_interval` to reduce KV cache memory by only keeping every N-th frame as a keyframe.
|
|
|
|
| 92 |
|
| 93 |
```bash
|
| 94 |
python demo.py --model_path /path/to/checkpoint.pt \
|
| 95 |
--image_folder /path/to/images/ --keyframe_interval 6
|
| 96 |
```
|
| 97 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
### Sky Masking
|
| 99 |
|
| 100 |
+
Sky masking filters out sky points from the reconstructed point cloud.
|
| 101 |
|
| 102 |
**Setup:**
|
| 103 |
|
| 104 |
```bash
|
| 105 |
+
pip install onnxruntime
|
|
|
|
|
|
|
|
|
|
| 106 |
```
|
| 107 |
|
|
|
|
|
|
|
| 108 |
**Usage:**
|
| 109 |
|
| 110 |
```bash
|
|
|
|
| 112 |
--image_folder /path/to/images/ --mask_sky
|
| 113 |
```
|
| 114 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
# ๐ License
|
| 116 |
|
| 117 |
This project is released under the Apache License 2.0. See [LICENSE](LICENSE.txt) file for details.
|
|
|
|
| 129 |
|
| 130 |
# โจ Acknowledgments
|
| 131 |
|
| 132 |
+
This work builds upon several open-source projects:
|
|
|
|
|
|
|
|
|
|
| 133 |
- [VGGT](https://github.com/facebookresearch/vggt)
|
| 134 |
- [DINOv2](https://github.com/facebookresearch/dinov2)
|
| 135 |
+
- [Flashinfer](https://github.com/flashinfer-ai/flashinfer)
|
|
|
|
|
|