stevenbucaille commited on
Commit
a8ec2da
·
verified ·
1 Parent(s): 204cdaf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +164 -3
README.md CHANGED
@@ -2,9 +2,170 @@
2
  tags:
3
  - model_hub_mixin
4
  - pytorch_model_hub_mixin
 
 
 
5
  ---
6
 
7
  This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Code: [More Information Needed]
9
- - Paper: [More Information Needed]
10
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  tags:
3
  - model_hub_mixin
4
  - pytorch_model_hub_mixin
5
+ library_name: transformers
6
+ license: other
7
+ pipeline_tag: keypoint-detection
8
  ---
9
 
10
  This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
11
+
12
+ This is a LightGlue variant trained on DISK, which requires `kornia` to be installed and is usable with transformers with the following lines of code
13
+ ```python
14
+ from transformers import LightGlueForKeypointMatching
15
+
16
+ model = LightGlueForKeypointMatching.from_pretrained("ETH-CVG/lightglue_disk", trust_remote_code=True)
17
+ ```
18
+
19
+ # LightGlue
20
+
21
+ The LightGlue model was proposed
22
+ in [LightGlue: Local Feature Matching at Light Speed](http://arxiv.org/abs/2306.13643) by Philipp Lindenberger, Paul-Edouard Sarlin and Marc Pollefeys.
23
+
24
+ This model consists of matching two sets of interest points detected in an image. Paired with the
25
+ [SuperPoint model](https://huggingface.co/magic-leap-community/superpoint), it can be used to match two images and
26
+ estimate the pose between them. This model is useful for tasks such as image matching, homography estimation, etc.
27
+
28
+ The abstract from the paper is the following :
29
+ We introduce LightGlue, a deep neural network that learns to match local features across images. We revisit multiple
30
+ design decisions of SuperGlue, the state of the art in sparse matching, and derive simple but effective improvements.
31
+ Cumulatively, they make LightGlue more efficient – in terms of both memory and computation, more accurate, and much
32
+ easier to train. One key property is that LightGlue is adaptive to the difficulty of the problem: the inference is
33
+ much faster on image pairs that are intuitively easy to match, for example because of a larger visual overlap or
34
+ limited appearance change. This opens up exciting prospects for deploying deep matchers in latency-sensitive
35
+ applications like 3D reconstruction. The code and trained models are publicly available at [github.com/cvg/LightGlue](https://github.com/cvg/LightGlue).
36
+
37
+
38
+ <img src="https://raw.githubusercontent.com/cvg/LightGlue/main/assets/easy_hard.jpg" alt="drawing" width="800"/>
39
+
40
+ This model was contributed by [stevenbucaille](https://huggingface.co/stevenbucaille).
41
+ The original code can be found [here](https://github.com/cvg/LightGlue).
42
+
43
+ ## Demo notebook
44
+
45
+ A demo notebook showcasing inference + visualization with LightGlue can be found [TBD]().
46
+
47
+
48
+ ## Model Details
49
+
50
+ ### Model Description
51
+
52
+ LightGlue is a neural network that matches two sets of local features by jointly finding correspondences and rejecting non-matchable points.
53
+ Building on the success of SuperGlue, this model has the ability to introspect the confidence of its own predictions. It adapts the amount of
54
+ computation to the difficulty of each image pair to match. Both its depth and width are adaptive :
55
+ 1. the inference can stop at an early layer if all predictions are ready
56
+ 2. points that are deemed not matchable are discarded early from further steps.
57
+ The resulting model, LightGlue, is finally faster, more accurate, and easier to train than the long-unrivaled SuperGlue.
58
+
59
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/632885ba1558dac67c440aa8/ILpGyHuWwK2M9Bz0LmZLh.png" alt="drawing" width="1000"/>
60
+
61
+ - **Developed by:** ETH Zurich - Computer Vision and Geometry Lab
62
+ - **Model type:** Image Matching
63
+ - **License:** ACADEMIC OR NON-PROFIT ORGANIZATION NONCOMMERCIAL RESEARCH USE ONLY (implied by the use of SuperPoint as its keypoint detector)
64
+
65
+ ### Model Sources
66
+
67
+ <!-- Provide the basic links for the model. -->
68
+
69
+ - **Repository:** https://github.com/cvg/LightGlue
70
+ - **Paper:** http://arxiv.org/abs/2306.13643
71
+ - **Demo:** https://colab.research.google.com/github/cvg/LightGlue/blob/main/demo.ipynb
72
+
73
+ ## Uses
74
+
75
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
76
+
77
+ ### Direct Use
78
+
79
+ LightGlue is designed for feature matching and pose estimation tasks in computer vision. It can be applied to a variety of multiple-view
80
+ geometry problems and can handle challenging real-world indoor and outdoor environments. However, it may not perform well on tasks that
81
+ require different types of visual understanding, such as object detection or image classification.
82
+
83
+ ## How to Get Started with the Model
84
+
85
+ Here is a quick example of using the model. Since this model is an image matching model, it requires pairs of images to be matched.
86
+ The raw outputs contain the list of keypoints detected by the keypoint detector as well as the list of matches with their corresponding
87
+ matching scores.
88
+ ```python
89
+ from transformers import AutoImageProcessor, AutoModel
90
+ import torch
91
+ from PIL import Image
92
+ import requests
93
+
94
+ url_image1 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_98169888_3347710852.jpg"
95
+ image1 = Image.open(requests.get(url_image1, stream=True).raw)
96
+ url_image2 = "https://raw.githubusercontent.com/magicleap/SuperGluePretrainedNetwork/refs/heads/master/assets/phototourism_sample_images/united_states_capitol_26757027_6717084061.jpg"
97
+ image2 = Image.open(requests.get(url_image2, stream=True).raw)
98
+
99
+ images = [image1, image2]
100
+
101
+ processor = AutoImageProcessor.from_pretrained("ETH-CVG/lightglue_disk", trust_remote_code=True)
102
+ model = AutoModel.from_pretrained("ETH-CVG/lightglue_disk")
103
+
104
+ inputs = processor(images, return_tensors="pt")
105
+ with torch.no_grad():
106
+ outputs = model(**inputs)
107
+ ```
108
+
109
+ You can use the `post_process_keypoint_matching` method from the `LightGlueImageProcessor` to get the keypoints and matches in a readable format:
110
+ ```python
111
+ image_sizes = [[(image.height, image.width) for image in images]]
112
+ outputs = processor.post_process_keypoint_matching(outputs, image_sizes, threshold=0.2)
113
+ for i, output in enumerate(outputs):
114
+ print("For the image pair", i)
115
+ for keypoint0, keypoint1, matching_score in zip(
116
+ output["keypoints0"], output["keypoints1"], output["matching_scores"]
117
+ ):
118
+ print(
119
+ f"Keypoint at coordinate {keypoint0.numpy()} in the first image matches with keypoint at coordinate {keypoint1.numpy()} in the second image with a score of {matching_score}."
120
+ )
121
+ ```
122
+
123
+ You can visualize the matches between the images by providing the original images as well as the outputs to this method:
124
+ ```python
125
+ processor.plot_keypoint_matching(images, outputs)
126
+ ```
127
+
128
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/632885ba1558dac67c440aa8/duPp09ty8NRZlMZS18ccP.png)
129
+
130
+
131
+ ## Training Details
132
+
133
+ LightGlue is trained on large annotated datasets for pose estimation, enabling it to learn priors for pose estimation and reason about the 3D scene.
134
+ The training data consists of image pairs with ground truth correspondences and unmatched keypoints derived from ground truth poses and depth maps.
135
+
136
+ LightGlue follows the supervised training setup of SuperGlue. It is first pre-trained with synthetic homographies sampled from 1M images.
137
+ Such augmentations provide full and noise-free supervision but require careful tuning. LightGlue is then fine-tuned with the MegaDepth dataset,
138
+ which includes 1M crowd-sourced images depicting 196 tourism landmarks, with camera calibration and poses recovered by SfM and
139
+ dense depth by multi-view stereo.
140
+
141
+ #### Training Hyperparameters
142
+
143
+ - **Training regime:** fp32
144
+
145
+ #### Speeds, Sizes, Times
146
+
147
+ LightGlue is designed to be efficient and runs in real-time on a modern GPU. A forward pass takes approximately 44 milliseconds (22 FPS) for an image pair.
148
+ The model has 13.7 million parameters, making it relatively compact compared to some other deep learning models.
149
+ The inference speed of LightGlue is suitable for real-time applications and can be readily integrated into
150
+ modern Simultaneous Localization and Mapping (SLAM) or Structure-from-Motion (SfM) systems.
151
+
152
+ ## Citation
153
+
154
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
155
+
156
+ **BibTeX:**
157
+
158
+ ```bibtex
159
+ @inproceedings{lindenberger2023lightglue,
160
+ author = {Philipp Lindenberger and
161
+ Paul-Edouard Sarlin and
162
+ Marc Pollefeys},
163
+ title = {{LightGlue: Local Feature Matching at Light Speed}},
164
+ booktitle = {ICCV},
165
+ year = {2023}
166
+ }
167
+ ```
168
+
169
+ ## Model Card Authors
170
+
171
+ [Steven Bucaille](https://github.com/sbucaille)