marstin commited on
Commit
39bb9e2
·
1 Parent(s): d425e71

[martin-dev] fix readme

Browse files
Files changed (1) hide show
  1. README.md +17 -245
README.md CHANGED
@@ -1,245 +1,17 @@
1
- # <img src="imgs/logo.png" alt="VLM-Lens Logo" height="48" style="vertical-align:middle; margin-right:50px;"/> VLM-Lens
2
-
3
- [![python](https://img.shields.io/badge/Python-3.10%2B-blue.svg?logo=python&style=flat-square)](https://www.python.org/downloads/release/python-31012/)
4
- [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg?style=flat-square)](https://www.apache.org/licenses/LICENSE-2.0)
5
- [![Documentation](https://img.shields.io/badge/Documentation-Online-green.svg?style=flat-square)](https://compling-wat.github.io/vlm-lens/)
6
- [![Jupyter Notebook](https://img.shields.io/badge/Jupyter-Notebook-orange.svg?logo=jupyter&style=flat-square)](tutorial-notebooks/guide.ipynb)
7
- [![Google Colab](https://img.shields.io/badge/Google-Colab-orange?logo=googlecolab&style=flat-square)](https://colab.research.google.com/drive/13WC4HA6syXFotmn7S8WsVz4OmoHsfHV9?usp=sharing)
8
-
9
- <p align="center">
10
- <img src="imgs/teaser.png" alt="VLM-Lens Teaser" width="100%" />
11
- </p>
12
-
13
- ## Table of Contents
14
-
15
- - [Environment Setup](#environment-setup)
16
- - [Example Usage: Extract Qwen2-VL-2B Embeddings with VLM-Lens](#example-usage-extract-qwen2-vl-2b-embeddings-with-vlm-lens)
17
- - [General Command-Line Demo](#general-command-line-demo)
18
- - [Run Qwen2-VL-2B Embeddings Extraction](#run-qwen2-vl-2b-embeddings-extraction)
19
- - [Layers of Interest in a VLM](#layers-of-interest-in-a-vlm)
20
- - [Retrieving All Named Modules](#retrieving-all-named-modules)
21
- - [Matching Layers](#matching-layers)
22
- - [Feature Extraction using HuggingFace Datasets](#feature-extraction-using-huggingface-datasets)
23
- - [Output Database](#output-database)
24
- - [Demo: Principal Component Analysis over Primitive Concept](#principal-component-analysis-over-primitive-concept)
25
- - [Contributing to VLM-Lens](#contributing-to-vlm-lens)
26
- - [Miscellaneous](#miscellaneous)
27
-
28
- ## Environment Setup
29
- We recommend using a virtual environment to manage your dependencies. You can create one using the following command to create a virtual environment under
30
- ```bash
31
- virtualenv --no-download "venv/vlm-lens-base" --prompt "vlm-lens-base" # Or "python3.10 -m venv venv/vlm-lens-base"
32
- source venv/vlm-lens-base/bin/activate
33
- ```
34
-
35
- Then, install the required dependencies:
36
- ```bash
37
- pip install --upgrade pip
38
- pip install -r envs/base/requirements.txt
39
- ```
40
-
41
- There are some models that require different dependencies, and we recommend creating a separate virtual environment for each of them to avoid conflicts.
42
- For such models, we have offered a separate `requirements.txt` file under `envs/<model_name>/requirements.txt`, which can be installed in the same way as above.
43
- All the model-specific environments are independent of the base environment, and can be installed individually.
44
-
45
- **Notes**:
46
- 1. There may be local constraints (e.g., issues caused by cluster regulations) that cause failure of the above commands. In such cases, you are encouraged to modify it whenever fit. We welcome issues and pull requests to help us keep the dependencies up to date.
47
- 2. Some models, due to the resources available at the development time, may not be fully supported on modern GPUs. While our released environments are tested on L40s GPUs, we recommend following the error messages to adjust the environment setups for your specific hardware.
48
-
49
- ## Example Usage: Extract Qwen2-VL-2B Embeddings with VLM-Lens
50
-
51
- ### General Command-Line Demo
52
-
53
- The general command to run the quick command-line demo is:
54
- ```bash
55
- python -m src.main \
56
- --config <config-file-path> \
57
- --debug
58
- ```
59
- with an optional debug flag to see more detailed outputs.
60
-
61
- Note that the config file should be in yaml format, and that any arguments you want to send to the huggingface API should be under the `model` key.
62
- See `configs/models/qwen/qwen-2b.yaml` as an example.
63
-
64
- ### Run Qwen2-VL-2B Embeddings Extraction
65
- The file `configs/models/qwen/qwen-2b.yaml` contains the configuration for running the Qwen2-VL-2B model.
66
-
67
- ```yaml
68
- architecture: qwen # Architecture of the model, see more options in src/models/configs.py
69
- model_path: Qwen/Qwen2-VL-2B-Instruct # HuggingFace model path
70
- model: # Model configuration, i.e., arguments to pass to the model
71
- - torch_dtype: auto
72
- output_db: output/qwen.db # Output database file to store embeddings
73
- input_dir: ./data/ # Directory containing images to process
74
- prompt: "Describe the color in this image in one word." # Textual prompt
75
- pooling_method: None # Pooling method to use for aggregating token embeddings over tokens (options: None, mean, max)
76
- modules: # List of modules to extract embeddings from
77
- - lm_head
78
- - visual.blocks.31
79
- ```
80
-
81
- To run the extraction on available GPU, use the following command:
82
- ```bash
83
- python -m src.main --config configs/models/qwen/qwen-2b.yaml --debug
84
- ```
85
-
86
- If there is no GPU available, you can run it on CPU with:
87
- ```bash
88
- python -m src.main --config configs/models/qwen/qwen-2b.yaml --device cpu --debug
89
- ```
90
-
91
- ## Layers of Interest in a VLM
92
- ### Retrieving All Named Modules
93
- Unfortunately there is no way to find which layers to potentially match to without loading the model. This can take quite a bit of system time figuring out.
94
-
95
- Instead, we offer some cached results under `logs/` for each model, which were generated through including the `-l` or `--log-named-modules` flag when running `python -m src.main`.
96
-
97
- When running this flag, it is not necessary to set modules or anything besides the architecture and HuggingFace model path.
98
-
99
- ### Matching Layers
100
- To automatically set up which layers to find/use, one should use the Unix style strings, where you can use `*` to denote wildcards.
101
-
102
- For example, if one wanted to match with all the attention layer's query projection layer for Qwen, simply add the following lines to the .yaml file:
103
- ```
104
- modules:
105
- - model.layers.*.self_attn.q_proj
106
- ```
107
- ## Feature Extraction using HuggingFace Datasets
108
- To use VLM-Lens with either hosted or local datasets, there are multiple methods you can use depending on the location of the input images.
109
-
110
- First, your dataset must be standardized to a format that includes the attributes of `prompt`, `label` and `image_path`. Here is a snippet of the `compling/coco-val2017-obj-qa-categories` dataset, adjusted with the former attributes:
111
-
112
- | id | prompt | label | image_path |
113
- |---|---|---|---|
114
- | 397,133 | Is this A photo of a dining table on the bottom | yes | /path/to/397133.png
115
- | 37,777 | Is this A photo of a dining table on the top | no | /path/to/37777.png
116
-
117
- This can be achieved manually or using the helper script in `scripts/map_datasets.py`.
118
-
119
- ### Method 1: Using hosted datasets
120
- If you are using datasets hosted on a platform such as HuggingFace, you will either use images that are also *hosted*, or ones that are *downloaded locally* with an identifier to map back to the hosted dataset (e.g., filename).
121
-
122
- You must use the `dataset_path` attribute in your configuration file with the appropriate `dataset_split` (if it exists, otherwise leave it out).
123
-
124
- #### 1(a): Hosted Dataset with Hosted Images
125
- ```yaml
126
- dataset:
127
- - dataset_path: compling/coco-val2017-obj-qa-categories
128
- - dataset_split: val2017
129
- ```
130
-
131
- #### 1(b): Hosted Dataset with Local Images
132
-
133
- > 🚨 **NOTE**: The `image_path` attribute in the dataset must contain either filenames or relative paths, such that a cell value of `train/00023.png` can be joined with `image_dataset_path` to form the full absolute path: `/path/to/local/images/train/00023.png`. If the `image_path` attribute does not require any additional path joining, you can leave out the `image_dataset_path` attribute.
134
-
135
- ```yaml
136
- dataset:
137
- - dataset_path: compling/coco-val2017-obj-qa-categories
138
- - dataset_split: val2017
139
- - image_dataset_path: /path/to/local/images # downloaded using configs/dataset/download-coco.yaml
140
- ```
141
-
142
-
143
- ### Method 2: Using local datasets
144
- #### 2(a): Local Dataset containing Image Files
145
- ```yaml
146
- dataset:
147
- - local_dataset_path: /path/to/local/CLEVR
148
- - dataset_split: train # leave out if unspecified
149
- ```
150
-
151
- #### 2(b): Local Dataset with Separate Input Image Directory
152
-
153
- > 🚨 **NOTE**: The `image_path` attribute in the dataset must contain either filenames or relative paths, such that a cell value of `train/00023.png` can be joined with `image_dataset_path` to form the full absolute path: `/path/to/local/images/train/00023.png`. If the `image_path` attribute does not require any additional path joining, you can leave out the `image_dataset_path` attribute.
154
-
155
- ```yaml
156
- dataset:
157
- - local_dataset_path: /path/to/local/CLEVR
158
- - dataset_split: train # leave out if unspecified
159
- - image_dataset_path: /path/to/local/CLEVR/images
160
- ```
161
-
162
- ### Output Database
163
- Specified by the `-o` and `--output-db` flags, this specifies the specific output database we want. From this, in SQL we have a single table under the name `tensors` with the following columns:
164
- ```
165
- name, architecture, timestamp, image_path, prompt, label, layer, tensor_dim, tensor
166
- ```
167
- where each column contains:
168
- 1. `name` represents the model path from HuggingFace.
169
- 2. `architecture` is the supported flags above.
170
- 3. `timestamp` is the specific time that the model was ran.
171
- 4. `image_path` is the absolute path to the image.
172
- 5. `prompt` stores the prompt used in that instance.
173
- 6. `label` is an optional cell that stores the "ground-truth" answer, which is helpful in use cases such as classification.
174
- 7. `layer` is the matched layer from `model.named_modules()`
175
- 8. `pooling_method` is the pooling method used for aggregating token embeddings over tokens.
176
- 9. `tensor_dim` is the dimension of the tensor saved.
177
- 10. `tensor` is the embedding saved.
178
-
179
- ## Principal Component Analysis over Primitive Concept
180
-
181
- ### Data Collection
182
-
183
- Download license-free images for primitive concepts (e.g., colors):
184
-
185
- ```bash
186
- pip install -r data/concepts/requirements.txt
187
- python -m data.concepts.download --config configs/concepts/colors.yaml
188
- ```
189
-
190
- ### Embedding Extraction
191
-
192
- Run the LLaVA model to obtain embeddings of the concept images:
193
-
194
- ```bash
195
- python -m src.main --config configs/models/llava-7b/llava-7b-concepts-colors.yaml --device cuda
196
- ```
197
-
198
- Also, run the LLaVA model to obtain embeddings of the test images:
199
-
200
- ```bash
201
- python -m src.main --config configs/models/llava-7b/llava-7b.yaml --device cuda
202
- ```
203
-
204
- ### Run PCA
205
-
206
- Several PCA-based analysis scripts are provided:
207
- ```bash
208
- pip install -r src/concepts/requirements.txt
209
- python -m src.concepts.pca
210
- python -m src.concepts.pca_knn
211
- python -m src.concepts.pca_separation
212
- ```
213
-
214
- ## Contributing to VLM-Lens
215
-
216
- We welcome contributions to VLM-Lens! If you have suggestions, improvements, or bug fixes, please consider submitting a pull request, and we are actively reviewing them.
217
-
218
- We generally follow the [Google Python Styles](https://google.github.io/styleguide/pyguide.html) to ensure readability, with a few exceptions stated in `.flake8`.
219
- We use pre-commit hooks to ensure code quality and consistency---please make sure to run the following scripts before committing:
220
- ```python
221
- pip install pre-commit
222
- pre-commit install
223
- ```
224
-
225
-
226
- ## Miscellaneous
227
-
228
- ### Using a Cache
229
- To use a specific cache, one should set the `HF_HOME` environment variable as so:
230
- ```
231
- HF_HOME=./cache/ python -m src.main --config configs/models/clip/clip.yaml --debug
232
- ```
233
-
234
-
235
- ### Using Submodule-Based Models
236
- There are some models that require separate submodules to be cloned, such as Glamm.
237
- To use these models, please follow the instructions below to download the submodules.
238
-
239
- #### Glamm
240
- For Glamm (GroundingLMM), one needs to clone the separate submodules, which can be done with the following command:
241
- ```
242
- git submodule update --recursive --init
243
- ```
244
-
245
- See [our document](https://compling-wat.github.io/vlm-lens/tutorials/grounding-lmm.html) for details on the installation.
 
1
+ ---
2
+ title: VLM-Lens
3
+ emoji: 👁️
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: gradio
7
+ sdk_version: "4.0.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # VLM-Lens 👁️🔍
13
+
14
+ A visual lens into the internals of Vision-Language Models.
15
+ Built with Gradio, this demo lets you explore token-level probabilities, spatial grounding, and interpretability visualizations.
16
+
17
+ > Developed by [@marstin](https://huggingface.co/marstin)