shruthib commited on
Commit
2553e7a
·
verified ·
1 Parent(s): b9432bf

Model Release (#1)

Browse files

- Model Release (ba5c7c8a5203ec65aaec329a9f890b8b780d9301)
- Remove whitespace changes in LICENSE (fd1d2008bcc32388dea23cba1849b39c5ae4ad2c)

README.md CHANGED
@@ -1,5 +1,248 @@
1
- ---
2
- license: other
3
- license_name: msrla
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: msrla
4
+ license_link: https://huggingface.co/microsoft/maira-2/blob/main/LICENSE
5
+ library_name: transformers
6
+ ---
7
+
8
+ # Model Card for MAIRA-2
9
+
10
+ <!-- Provide a quick summary of what the model is/does. -->
11
+
12
+ MAIRA-2 is a multimodal transformer designed for the generation of grounded or non-grounded radiology reports from chest X-rays. It is described in more detail in [MAIRA-2: Grounded Radiology Report Generation (S. Bannur, K. Bouzid et al., 2024)](https://arxiv.org/abs/2406.04449). MAIRA-2 has been built for research purposes only and is being shared to facilitate comparison and further research.
13
+
14
+ ## Model Details
15
+
16
+ ### Model Description
17
+
18
+ <!-- Provide a longer summary of what this model is. -->
19
+ MAIRA-2 is composed of the image encoder [RAD-DINO-MAIRA-2](https://huggingface.co/microsoft/rad-dino-maira-2) (used frozen), a projection layer (trained from scratch), and the language model [vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) (fully fine-tuned).
20
+
21
+ - **Developed by:** Microsoft Research Health Futures
22
+ - **Model type:** Multimodal transformer
23
+ - **Language(s) (NLP):** English
24
+ - **License:** [MSRLA](./LICENSE)
25
+ - **Finetuned from model [optional]:** [vicuna-7b-1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5), [RAD-DINO-MAIRA-2](https://huggingface.co/microosft/rad-dino-maira-2)
26
+
27
+ ## Uses
28
+
29
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
30
+ MAIRA-2 is shared for research purposes only. It is **not meant to be used for clinical practice.** MAIRA-2 was not extensively tested for its capabilities and properties, including its accuracy and reliability in application settings, fairness across different demographics and uses, and security and privacy.
31
+
32
+ ### Direct Use
33
+
34
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
35
+
36
+ As inputs, MAIRA-2 takes a frontal chest X-ray, and any of the following:
37
+ - A lateral view from the current study
38
+ - A frontal view from the *prior* study, with accompanying prior report
39
+ - The indication for the current study
40
+ - The technique and comparison sections for the current study
41
+
42
+ MAIRA-2 can generate the _findings_ section of the current study, in one of two forms:
43
+ - Narrative text, without any image annotations (this is the typical report generation scenario).
44
+ - As a grounded report, wherein all described findings are accompanied by zero or more bounding boxes indicating their location on the current frontal image.
45
+
46
+ MAIRA-2 can also perform phrase grounding. In this case, it must also be provided with an input phrase. It will then repeat the phrase and generate a bounding box localising the finding described in the phrase.
47
+
48
+ These use-cases are illustrated with [sample code below](README.md#use-case-3-phrase-grounding).
49
+
50
+ ### Out-of-Scope Use
51
+
52
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
53
+
54
+ MAIRA-2 was trained on chest X-rays from adults with English language reports only, and is not expected to work on any other imaging modality or anatomy. Variations in the input prompt (e.g. changing the instruction) are likely to degrade performance, as this model was *not* optimised for arbitrary user inputs.
55
+
56
+ As above, this is a research model which should not be used in any real clinical or production scenario.
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ ### Data biases
63
+ MAIRA-2 was trained on chest X-ray report datasets from Spain (translated from the original Spanish to English) and the USA, listed below. Reporting styles, patient demographics and disease prevalence, and image acquisition protocols can vary across health systems and regions. These factors will impact the generalisability of the model.
64
+
65
+ ### Model errors (fabrication, omission)
66
+
67
+ This model does not perform perfectly on its tasks, as outlined in more detail in the [MAIRA-2 report](https://arxiv.org/abs/2406.04449). Hence, errors can be present in the generated (grounded) reports.
68
+
69
+ ## How to Get Started with the Model
70
+
71
+ We demonstrate below how to run inference with MAIRA-2 for its three capabilities: findings generation with and without grounding, and phrase grounding.
72
+
73
+ ### Setup
74
+
75
+ First, initialise the model and put it in eval mode.
76
+ ```python
77
+ from transformers import AutoModelForCausalLM, AutoProcessor
78
+ from pathlib import Path
79
+ import torch
80
+
81
+ model = AutoModelForCausalLM.from_pretrained("microsoft/maira-2", trust_remote_code=True)
82
+ processor = AutoProcessor.from_pretrained("microsoft/maira-2", trust_remote_code=True)
83
+
84
+ device = torch.device("cuda")
85
+ model = model.eval()
86
+ model = model.to(device)
87
+ ```
88
+
89
+ We need to get some data to demonstrate the forward pass.
90
+ For this example, we'll collect an example from the IU X-ray dataset, which has a permissive license.
91
+
92
+ ```python
93
+ import requests
94
+ from PIL import Image
95
+
96
+ def get_sample_data() -> dict[str, Image.Image | str]:
97
+ """
98
+ Download chest X-rays from IU-Xray, which we didn't train MAIRA-2 on. License is CC.
99
+ We modified this function from the Rad-DINO repository on Huggingface.
100
+ """
101
+ frontal_image_url = "https://openi.nlm.nih.gov/imgs/512/145/145/CXR145_IM-0290-1001.png"
102
+ lateral_image_url = "https://openi.nlm.nih.gov/imgs/512/145/145/CXR145_IM-0290-2001.png"
103
+ headers = {"User-Agent": "MAIRA-2"}
104
+ frontal_response = requests.get(frontal_image_url, headers=headers, stream=True)
105
+ frontal_image = Image.open(frontal_response.raw)
106
+ lateral_response = requests.get(lateral_image_url, headers=headers, stream=True)
107
+ lateral_image = Image.open(lateral_response.raw)
108
+
109
+ sample_data = {
110
+ "frontal": frontal_image,
111
+ "lateral": lateral_image,
112
+ "indication": "Dyspnea.",
113
+ "comparison": "None.",
114
+ "technique": "PA and lateral views of the chest.",
115
+ "phrase": "Pleural effusion." # For the phrase grounding example. This patient has pleural effusion.
116
+ }
117
+ return sample_data
118
+
119
+ sample_data = get_sample_data()
120
+ ```
121
+
122
+ ### Use-case 1 and 2: Findings generation with or without grounding
123
+
124
+ We can toggle whether MAIRA-2 generates a grounded report based on how we preprocess the inputs, as it uses a different prompt. Lets start without grounding (`get_grounding=False`). While generating, for non-grounded reporting use `max_new_tokens=300`, and for grounded reporting use `max_new_tokens=450` to accommodate additional box and object tokens.
125
+ ```python
126
+ processed_inputs = processor.format_and_preprocess_reporting_input(
127
+ current_frontal=sample_data["frontal"],
128
+ current_lateral=sample_data["lateral"],
129
+ prior_frontal=None, # Our example has no prior
130
+ indication=sample_data["indication"],
131
+ technique=sample_data["technique"],
132
+ comparison=sample_data["comparison"],
133
+ prior_report=None, # Our example has no prior
134
+ return_tensors="pt",
135
+ get_grounding=False # For this example we generate a non-grounded report
136
+ )
137
+
138
+ processed_inputs = processed_inputs.to(device)
139
+ with torch.no_grad():
140
+ output_decoding = model.generate(
141
+ **processed_inputs,
142
+ max_new_tokens=300, # Set to 450 for grounded reporting.
143
+ use_cache=True,
144
+ )
145
+ prompt_length = processed_inputs["input_ids"].shape[-1]
146
+ decoded_text = processor.decode(output_decoding[0][prompt_length:], skip_special_tokens=True)
147
+ decoded_text = decoded_text.lstrip() # Findings generation completions have a single leading space
148
+
149
+ print("Parsed prediction:", processor.convert_output_to_plaintext_or_grounded_sequence(decoded_text))
150
+ ```
151
+
152
+ We get something that looks like this:
153
+ > "There is a large right pleural effusion with associated right basilar atelectasis. The left lung is clear. No pneumothorax is identified. The cardiomediastinal silhouette and hilar contours are normal. There is no free air under the diaphragm. Surgical clips are noted in the right upper quadrant of the abdomen."
154
+
155
+ If we had set `get_grounding=True`, MAIRA-2 would generate a grounded report. For this example, that looks like this:
156
+ ```python
157
+ ('There is a large right pleural effusion.', [(0.055, 0.275, 0.445, 0.665)]),
158
+ ('The left lung is clear.', None),
159
+ ('No pneumothorax is identified.', None),
160
+ ('The cardiomediastinal silhouette is within normal limits.', None),
161
+ ('The visualized osseous structures are unremarkable.', None)
162
+ ```
163
+ The generated bounding box coordinates are the `(x, y)` coordinates of the top left and bottom right corners of the box, e.g. `(x_topleft, y_topleft, x_bottomright, y_bottomright)`. These are relative to the _cropped_ image (that is, the image that MAIRA-2 ultimately got as input), so be careful while visualising. The processor provides a method `adjust_box_for_original_image_size` to get boxes relative to the original image shape.
164
+
165
+ Note that MAIRA-2 generates slightly different reports for grounded and non-grounded reporting scenarios, a side-effect of its grounded reporting training data coming from a different data distribution.
166
+
167
+ ### Use-case 3: Phrase Grounding
168
+
169
+ Here the input is different as we provide the model with a phrase to ground in the image. Recall (`get_sample_data`) that our phrase here is just "Pleural effusion", which we already know is present in this image.
170
+
171
+ ```python
172
+ processed_inputs = processor.format_and_preprocess_phrase_grounding_input(
173
+ frontal_image=sample_data["frontal"],
174
+ phrase=sample_data["phrase"],
175
+ return_tensors="pt",
176
+ )
177
+
178
+ processed_inputs = processed_inputs.to(device)
179
+ with torch.no_grad():
180
+ output_decoding = model.generate(
181
+ **processed_inputs,
182
+ max_new_tokens=150,
183
+ use_cache=True,
184
+ )
185
+ prompt_length = processed_inputs["input_ids"].shape[-1]
186
+ decoded_text = processor.decode(output_decoding[0][prompt_length:], skip_special_tokens=True)
187
+
188
+ print("Parsed prediction:", processor.convert_output_to_plaintext_or_grounded_sequence(decoded_text))
189
+ ```
190
+ This gives us something like this:
191
+ ```python
192
+ ('Pleural effusion.', [(0.025, 0.345, 0.425, 0.575)])
193
+ ```
194
+ Again, as for grounded reporting we must remember the bbox coordinates are relative to the cropped image seen by MAIRA-2, use `processor.adjust_box_for_original_image_size` to get boxes adjusted for the original image shape.
195
+
196
+ ## Training details
197
+
198
+ We did not originally train MAIRA-2 using the exact model class provided here, however we have checked that its behaviour is the same. We provide this class to facilitate research re-use and inference.
199
+
200
+ ### Training data
201
+
202
+ MAIRA-2 was trained on a mix of public and private chest X-ray datasets. Each example comprises one or more CXR images and associated report text, with or without grounding (spatial annotations). The model is trained to generate the _findings_ section of the report, with or without grounding.
203
+
204
+ | Dataset | Country | # examples (ungrounded) | # examples (grounded) |
205
+ | ----- | ------ |------- | ----- |
206
+ | [MIMIC-CXR](https://www.nature.com/articles/s41597-019-0322-0) | USA | 55 218 | 595* |
207
+ | [PadChest](https://www.sciencedirect.com/science/article/abs/pii/S1361841520301614) | Spain | 52 828 | 3 122 |
208
+ | USMix (Private) | USA | 118 031 | 53 613 |
209
+
210
+ *We use the [MS-CXR](https://physionet.org/content/ms-cxr/) phrase grounding dataset to provide `grounding' examples from MIMIC-CXR.
211
+
212
+ ## Environmental Impact
213
+
214
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
215
+
216
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
217
+
218
+ - **Hardware Type:** NVIDIA A100 GPUs
219
+ - **Hours used:** 1432
220
+ - **Cloud Provider:** Azure
221
+ - **Compute Region:** West US 2
222
+ - **Carbon Emitted:** 107.4 CO₂ eq _(ostensibly offset by this provider)_
223
+
224
+ ## Citation
225
+
226
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
227
+
228
+ **BibTeX:**
229
+
230
+ ```
231
+ @article{Bannur2024MAIRA2GR,
232
+ title={MAIRA-2: Grounded Radiology Report Generation},
233
+ author={Shruthi Bannur and Kenza Bouzid and Daniel C. Castro and Anton Schwaighofer and Anja Thieme and Sam Bond-Taylor and Maximilian Ilse and Fernando P'erez-Garc'ia and Valentina Salvatelli and Harshita Sharma and Felix Meissen and Mercy Prasanna Ranjit and Shaury Srivastav and Julia Gong and Noel C. F. Codella and Fabian Falck and Ozan Oktay and Matthew P. Lungren and Maria T. A. Wetscherek and Javier Alvarez-Valle and Stephanie L. Hyland},
234
+ journal={arXiv},
235
+ year={2024},
236
+ volume={abs/2406.04449},
237
+ url={https://arxiv.org/abs/2406.04449}
238
+ }
239
+ ```
240
+
241
+ **APA:**
242
+
243
+ > Bannur*, S., Bouzid*, K., Castro, D. C., Schwaighofer, A., Thieme, A., Bond-Taylor, S., Ilse, M., P'erez-Garc'ia, F., Salvatelli, V., Sharma, H., Meissen, F., Ranjit, M.P., Srivastav, S., Gong, J., Codella, N.C.F., Falck, F., Oktay, O., Lungren, M.P., Wetscherek, M.T., Alvarez-Valle, J., & Hyland, S. L. (2024). *MAIRA-2: Grounded Radiology Report Generation*. arXiv preprint abs/2406.04449.
244
+
245
+ ## Model Card Contact
246
+
247
+ - Stephanie Hyland ([`stephanie.hyland@microsoft.com`](mailto:stephanie.hyland@microsoft.com))
248
+ - Shruthi Bannur ([`shruthi.bannur@microsoft.com`](mailto:shruthi.bannur@microsoft.com))
added_tokens.json ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</box>": 32203,
3
+ "</obj>": 32001,
4
+ "<box>": 32202,
5
+ "<image>": 32204,
6
+ "<lat_image>": 32206,
7
+ "<obj>": 32000,
8
+ "<prev_im>": 32205,
9
+ "<x0>": 32002,
10
+ "<x10>": 32012,
11
+ "<x11>": 32013,
12
+ "<x12>": 32014,
13
+ "<x13>": 32015,
14
+ "<x14>": 32016,
15
+ "<x15>": 32017,
16
+ "<x16>": 32018,
17
+ "<x17>": 32019,
18
+ "<x18>": 32020,
19
+ "<x19>": 32021,
20
+ "<x1>": 32003,
21
+ "<x20>": 32022,
22
+ "<x21>": 32023,
23
+ "<x22>": 32024,
24
+ "<x23>": 32025,
25
+ "<x24>": 32026,
26
+ "<x25>": 32027,
27
+ "<x26>": 32028,
28
+ "<x27>": 32029,
29
+ "<x28>": 32030,
30
+ "<x29>": 32031,
31
+ "<x2>": 32004,
32
+ "<x30>": 32032,
33
+ "<x31>": 32033,
34
+ "<x32>": 32034,
35
+ "<x33>": 32035,
36
+ "<x34>": 32036,
37
+ "<x35>": 32037,
38
+ "<x36>": 32038,
39
+ "<x37>": 32039,
40
+ "<x38>": 32040,
41
+ "<x39>": 32041,
42
+ "<x3>": 32005,
43
+ "<x40>": 32042,
44
+ "<x41>": 32043,
45
+ "<x42>": 32044,
46
+ "<x43>": 32045,
47
+ "<x44>": 32046,
48
+ "<x45>": 32047,
49
+ "<x46>": 32048,
50
+ "<x47>": 32049,
51
+ "<x48>": 32050,
52
+ "<x49>": 32051,
53
+ "<x4>": 32006,
54
+ "<x50>": 32052,
55
+ "<x51>": 32053,
56
+ "<x52>": 32054,
57
+ "<x53>": 32055,
58
+ "<x54>": 32056,
59
+ "<x55>": 32057,
60
+ "<x56>": 32058,
61
+ "<x57>": 32059,
62
+ "<x58>": 32060,
63
+ "<x59>": 32061,
64
+ "<x5>": 32007,
65
+ "<x60>": 32062,
66
+ "<x61>": 32063,
67
+ "<x62>": 32064,
68
+ "<x63>": 32065,
69
+ "<x64>": 32066,
70
+ "<x65>": 32067,
71
+ "<x66>": 32068,
72
+ "<x67>": 32069,
73
+ "<x68>": 32070,
74
+ "<x69>": 32071,
75
+ "<x6>": 32008,
76
+ "<x70>": 32072,
77
+ "<x71>": 32073,
78
+ "<x72>": 32074,
79
+ "<x73>": 32075,
80
+ "<x74>": 32076,
81
+ "<x75>": 32077,
82
+ "<x76>": 32078,
83
+ "<x77>": 32079,
84
+ "<x78>": 32080,
85
+ "<x79>": 32081,
86
+ "<x7>": 32009,
87
+ "<x80>": 32082,
88
+ "<x81>": 32083,
89
+ "<x82>": 32084,
90
+ "<x83>": 32085,
91
+ "<x84>": 32086,
92
+ "<x85>": 32087,
93
+ "<x86>": 32088,
94
+ "<x87>": 32089,
95
+ "<x88>": 32090,
96
+ "<x89>": 32091,
97
+ "<x8>": 32010,
98
+ "<x90>": 32092,
99
+ "<x91>": 32093,
100
+ "<x92>": 32094,
101
+ "<x93>": 32095,
102
+ "<x94>": 32096,
103
+ "<x95>": 32097,
104
+ "<x96>": 32098,
105
+ "<x97>": 32099,
106
+ "<x98>": 32100,
107
+ "<x99>": 32101,
108
+ "<x9>": 32011,
109
+ "<y0>": 32102,
110
+ "<y10>": 32112,
111
+ "<y11>": 32113,
112
+ "<y12>": 32114,
113
+ "<y13>": 32115,
114
+ "<y14>": 32116,
115
+ "<y15>": 32117,
116
+ "<y16>": 32118,
117
+ "<y17>": 32119,
118
+ "<y18>": 32120,
119
+ "<y19>": 32121,
120
+ "<y1>": 32103,
121
+ "<y20>": 32122,
122
+ "<y21>": 32123,
123
+ "<y22>": 32124,
124
+ "<y23>": 32125,
125
+ "<y24>": 32126,
126
+ "<y25>": 32127,
127
+ "<y26>": 32128,
128
+ "<y27>": 32129,
129
+ "<y28>": 32130,
130
+ "<y29>": 32131,
131
+ "<y2>": 32104,
132
+ "<y30>": 32132,
133
+ "<y31>": 32133,
134
+ "<y32>": 32134,
135
+ "<y33>": 32135,
136
+ "<y34>": 32136,
137
+ "<y35>": 32137,
138
+ "<y36>": 32138,
139
+ "<y37>": 32139,
140
+ "<y38>": 32140,
141
+ "<y39>": 32141,
142
+ "<y3>": 32105,
143
+ "<y40>": 32142,
144
+ "<y41>": 32143,
145
+ "<y42>": 32144,
146
+ "<y43>": 32145,
147
+ "<y44>": 32146,
148
+ "<y45>": 32147,
149
+ "<y46>": 32148,
150
+ "<y47>": 32149,
151
+ "<y48>": 32150,
152
+ "<y49>": 32151,
153
+ "<y4>": 32106,
154
+ "<y50>": 32152,
155
+ "<y51>": 32153,
156
+ "<y52>": 32154,
157
+ "<y53>": 32155,
158
+ "<y54>": 32156,
159
+ "<y55>": 32157,
160
+ "<y56>": 32158,
161
+ "<y57>": 32159,
162
+ "<y58>": 32160,
163
+ "<y59>": 32161,
164
+ "<y5>": 32107,
165
+ "<y60>": 32162,
166
+ "<y61>": 32163,
167
+ "<y62>": 32164,
168
+ "<y63>": 32165,
169
+ "<y64>": 32166,
170
+ "<y65>": 32167,
171
+ "<y66>": 32168,
172
+ "<y67>": 32169,
173
+ "<y68>": 32170,
174
+ "<y69>": 32171,
175
+ "<y6>": 32108,
176
+ "<y70>": 32172,
177
+ "<y71>": 32173,
178
+ "<y72>": 32174,
179
+ "<y73>": 32175,
180
+ "<y74>": 32176,
181
+ "<y75>": 32177,
182
+ "<y76>": 32178,
183
+ "<y77>": 32179,
184
+ "<y78>": 32180,
185
+ "<y79>": 32181,
186
+ "<y7>": 32109,
187
+ "<y80>": 32182,
188
+ "<y81>": 32183,
189
+ "<y82>": 32184,
190
+ "<y83>": 32185,
191
+ "<y84>": 32186,
192
+ "<y85>": 32187,
193
+ "<y86>": 32188,
194
+ "<y87>": 32189,
195
+ "<y88>": 32190,
196
+ "<y89>": 32191,
197
+ "<y8>": 32110,
198
+ "<y90>": 32192,
199
+ "<y91>": 32193,
200
+ "<y92>": 32194,
201
+ "<y93>": 32195,
202
+ "<y94>": 32196,
203
+ "<y95>": 32197,
204
+ "<y96>": 32198,
205
+ "<y97>": 32199,
206
+ "<y98>": 32200,
207
+ "<y99>": 32201,
208
+ "<y9>": 32111
209
+ }
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}You are an expert radiology assistant tasked with interpreting a chest X-ray study. {% for message in messages %}{% if message[\"role\"] == \"user\" %}USER: {% else %}ASSISTANT: {% endif %}{% for item in message[\"content\"] %}{% if item[\"type\"] == \"text\" %}{{ item[\"text\"] }}{% elif item[\"type\"] == \"image\" %}<image>{% endif %}{% endfor %}{% if message[\"role\"] == \"user\" %} {% else %}{{eos_token}}{% endif %}{% endfor %}{% if add_generation_prompt %}ASSISTANT: {% endif %}"
3
+ }
config.json ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Maira2ForConditionalGeneration"
4
+ ],
5
+ "auto_map": {
6
+ "AutoConfig": "configuration_maira2.Maira2Config",
7
+ "AutoModelForCausalLM": "modeling_maira2.Maira2ForConditionalGeneration",
8
+ "AutoModelForVision2Seq": "modeling_maira2.Maira2ForConditionalGeneration"
9
+ },
10
+ "hidden_size": 4096,
11
+ "ignore_index": -100,
12
+ "image_seq_length": 576,
13
+ "image_token_index": 32204,
14
+ "model_type": "maira2",
15
+ "pad_token_id": 0,
16
+ "projector_hidden_act": "gelu",
17
+ "projector_n_layers": 4,
18
+ "text_config": {
19
+ "_name_or_path": "lmsys/vicuna-7b-v1.5",
20
+ "add_cross_attention": false,
21
+ "architectures": [
22
+ "LlamaForCausalLM"
23
+ ],
24
+ "attention_bias": false,
25
+ "attention_dropout": 0.0,
26
+ "bad_words_ids": null,
27
+ "begin_suppress_tokens": null,
28
+ "bos_token_id": 1,
29
+ "chunk_size_feed_forward": 0,
30
+ "cross_attention_hidden_size": null,
31
+ "decoder_start_token_id": null,
32
+ "diversity_penalty": 0.0,
33
+ "do_sample": false,
34
+ "early_stopping": false,
35
+ "encoder_no_repeat_ngram_size": 0,
36
+ "eos_token_id": 2,
37
+ "exponential_decay_length_penalty": null,
38
+ "finetuning_task": null,
39
+ "forced_bos_token_id": null,
40
+ "forced_eos_token_id": null,
41
+ "head_dim": 128,
42
+ "hidden_act": "silu",
43
+ "hidden_size": 4096,
44
+ "id2label": {
45
+ "0": "LABEL_0",
46
+ "1": "LABEL_1"
47
+ },
48
+ "initializer_range": 0.02,
49
+ "intermediate_size": 11008,
50
+ "is_decoder": false,
51
+ "is_encoder_decoder": false,
52
+ "label2id": {
53
+ "LABEL_0": 0,
54
+ "LABEL_1": 1
55
+ },
56
+ "length_penalty": 1.0,
57
+ "max_length": 20,
58
+ "max_position_embeddings": 4096,
59
+ "min_length": 0,
60
+ "mlp_bias": false,
61
+ "model_type": "llama",
62
+ "no_repeat_ngram_size": 0,
63
+ "num_attention_heads": 32,
64
+ "num_beam_groups": 1,
65
+ "num_beams": 1,
66
+ "num_hidden_layers": 32,
67
+ "num_key_value_heads": 32,
68
+ "num_return_sequences": 1,
69
+ "output_attentions": false,
70
+ "output_hidden_states": false,
71
+ "output_scores": false,
72
+ "pad_token_id": 0,
73
+ "prefix": null,
74
+ "pretraining_tp": 1,
75
+ "problem_type": null,
76
+ "pruned_heads": {},
77
+ "remove_invalid_values": false,
78
+ "repetition_penalty": 1.0,
79
+ "return_dict": true,
80
+ "return_dict_in_generate": false,
81
+ "rms_norm_eps": 1e-05,
82
+ "rope_scaling": {
83
+ "factor": 1.5,
84
+ "rope_type": "linear"
85
+ },
86
+ "rope_theta": 10000.0,
87
+ "sep_token_id": null,
88
+ "suppress_tokens": null,
89
+ "task_specific_params": null,
90
+ "temperature": 1.0,
91
+ "tf_legacy_loss": false,
92
+ "tie_encoder_decoder": false,
93
+ "tie_word_embeddings": false,
94
+ "tokenizer_class": null,
95
+ "top_k": 50,
96
+ "top_p": 1.0,
97
+ "torch_dtype": "bfloat16",
98
+ "torchscript": false,
99
+ "typical_p": 1.0,
100
+ "use_bfloat16": false,
101
+ "use_cache": true,
102
+ "vocab_size": 32207
103
+ },
104
+ "torch_dtype": "float32",
105
+ "transformers_version": "4.46.0.dev0",
106
+ "vision_config": {
107
+ "_name_or_path": "",
108
+ "add_cross_attention": false,
109
+ "apply_layernorm": true,
110
+ "architectures": [
111
+ "Dinov2Model"
112
+ ],
113
+ "attention_probs_dropout_prob": 0.0,
114
+ "bad_words_ids": null,
115
+ "begin_suppress_tokens": null,
116
+ "bos_token_id": null,
117
+ "chunk_size_feed_forward": 0,
118
+ "cross_attention_hidden_size": null,
119
+ "decoder_start_token_id": null,
120
+ "diversity_penalty": 0.0,
121
+ "do_sample": false,
122
+ "drop_path_rate": 0.0,
123
+ "early_stopping": false,
124
+ "encoder_no_repeat_ngram_size": 0,
125
+ "eos_token_id": null,
126
+ "exponential_decay_length_penalty": null,
127
+ "finetuning_task": null,
128
+ "forced_bos_token_id": null,
129
+ "forced_eos_token_id": null,
130
+ "hidden_act": "gelu",
131
+ "hidden_dropout_prob": 0.0,
132
+ "hidden_size": 768,
133
+ "id2label": {
134
+ "0": "LABEL_0",
135
+ "1": "LABEL_1"
136
+ },
137
+ "image_size": 518,
138
+ "initializer_range": 0.02,
139
+ "is_decoder": false,
140
+ "is_encoder_decoder": false,
141
+ "label2id": {
142
+ "LABEL_0": 0,
143
+ "LABEL_1": 1
144
+ },
145
+ "layer_norm_eps": 1e-06,
146
+ "layerscale_value": 1.0,
147
+ "length_penalty": 1.0,
148
+ "max_length": 20,
149
+ "min_length": 0,
150
+ "mlp_ratio": 4,
151
+ "model_type": "dinov2",
152
+ "no_repeat_ngram_size": 0,
153
+ "num_attention_heads": 12,
154
+ "num_beam_groups": 1,
155
+ "num_beams": 1,
156
+ "num_channels": 3,
157
+ "num_hidden_layers": 12,
158
+ "num_return_sequences": 1,
159
+ "out_features": [
160
+ "stage12"
161
+ ],
162
+ "out_indices": [
163
+ 12
164
+ ],
165
+ "output_attentions": false,
166
+ "output_hidden_states": false,
167
+ "output_scores": false,
168
+ "pad_token_id": null,
169
+ "patch_size": 14,
170
+ "prefix": null,
171
+ "problem_type": null,
172
+ "pruned_heads": {},
173
+ "qkv_bias": true,
174
+ "remove_invalid_values": false,
175
+ "repetition_penalty": 1.0,
176
+ "reshape_hidden_states": false,
177
+ "return_dict": true,
178
+ "return_dict_in_generate": false,
179
+ "sep_token_id": null,
180
+ "stage_names": [
181
+ "stem",
182
+ "stage1",
183
+ "stage2",
184
+ "stage3",
185
+ "stage4",
186
+ "stage5",
187
+ "stage6",
188
+ "stage7",
189
+ "stage8",
190
+ "stage9",
191
+ "stage10",
192
+ "stage11",
193
+ "stage12"
194
+ ],
195
+ "suppress_tokens": null,
196
+ "task_specific_params": null,
197
+ "temperature": 1.0,
198
+ "tf_legacy_loss": false,
199
+ "tie_encoder_decoder": false,
200
+ "tie_word_embeddings": true,
201
+ "tokenizer_class": null,
202
+ "top_k": 50,
203
+ "top_p": 1.0,
204
+ "torch_dtype": "float32",
205
+ "torchscript": false,
206
+ "typical_p": 1.0,
207
+ "use_bfloat16": false,
208
+ "use_swiglu_ffn": false
209
+ },
210
+ "vision_feature_layer": -1,
211
+ "vision_feature_select_strategy": "default"
212
+ }
configuration_maira2.py ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2024 Microsoft. All rights reserved.
2
+ # Licensed under the MSRLA License. See LICENSE in the repo root for license information.
3
+
4
+
5
+ from typing import Any
6
+
7
+ from transformers import LlavaConfig
8
+
9
+
10
+ class Maira2Config(LlavaConfig):
11
+ """
12
+ This is the configuration class to store the configuration of a `Maira2ForConditionalGeneration` model. It is
13
+ used to instantiate a MAIRA-2 model according to the specified arguments, defining the model architecture.
14
+
15
+ It inherits from `LlavaConfig`. In addition to the inherited attributes, it adds the
16
+ ability to customize the multimodal projector through following attributes:
17
+
18
+ Args:
19
+ projector_n_layers (`int`, *optional*, defaults to 4):
20
+ Number of layers in the multimodal projector.
21
+ """
22
+
23
+ model_type = "maira2"
24
+
25
+ def __init__(
26
+ self,
27
+ projector_n_layers: int = 4,
28
+ **kwargs: Any,
29
+ ) -> None:
30
+ super().__init__(**kwargs)
31
+ self.hidden_size = self.text_config.hidden_size
32
+ self.projector_n_layers = projector_n_layers
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "max_length": 4096,
6
+ "max_new_tokens": 450,
7
+ "pad_token_id": 0,
8
+ "transformers_version": "4.46.0.dev0"
9
+ }
model-00001-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e899381d0b4def093d86a599663831282240a57676bd0f6b9646c37c83dce682
3
+ size 4955289768
model-00002-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1309c85392505e2fc421c741b7f3b56f4146f24662e26837220f32d53401aa80
3
+ size 4857207664
model-00003-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95ef725a26f95751c4e181fa44230b5793a297fc84928532797b0c51947e29e7
3
+ size 4857207704
model-00004-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:735d55ba903a0a5733af227fea50da1588588a9bf6512b3747acda91184a96df
3
+ size 4857207704
model-00005-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:38a7879c501f697864a36ca338866be4d45dfb2fc9b8d4f995d65cc0e984bac5
3
+ size 4857207704
model-00006-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9131ce3ccd96fb1c2654048ba8fb8e4bb9eb131b4395d06c5c1330d0d608fb72
3
+ size 3136688192
model.safetensors.index.json ADDED
@@ -0,0 +1,529 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 27520742400
4
+ },
5
+ "weight_map": {
6
+ "language_model.lm_head.weight": "model-00006-of-00006.safetensors",
7
+ "language_model.model.embed_tokens.weight": "model-00001-of-00006.safetensors",
8
+ "language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00006.safetensors",
9
+ "language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
10
+ "language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
11
+ "language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
12
+ "language_model.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
13
+ "language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
14
+ "language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
15
+ "language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
16
+ "language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
17
+ "language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00006.safetensors",
18
+ "language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
19
+ "language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
20
+ "language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
21
+ "language_model.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
22
+ "language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
23
+ "language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
24
+ "language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
25
+ "language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
26
+ "language_model.model.layers.10.input_layernorm.weight": "model-00003-of-00006.safetensors",
27
+ "language_model.model.layers.10.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
28
+ "language_model.model.layers.10.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
29
+ "language_model.model.layers.10.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
30
+ "language_model.model.layers.10.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
31
+ "language_model.model.layers.10.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
32
+ "language_model.model.layers.10.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
33
+ "language_model.model.layers.10.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
34
+ "language_model.model.layers.10.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
35
+ "language_model.model.layers.11.input_layernorm.weight": "model-00003-of-00006.safetensors",
36
+ "language_model.model.layers.11.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
37
+ "language_model.model.layers.11.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
38
+ "language_model.model.layers.11.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
39
+ "language_model.model.layers.11.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
40
+ "language_model.model.layers.11.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
41
+ "language_model.model.layers.11.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
42
+ "language_model.model.layers.11.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
43
+ "language_model.model.layers.11.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
44
+ "language_model.model.layers.12.input_layernorm.weight": "model-00003-of-00006.safetensors",
45
+ "language_model.model.layers.12.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
46
+ "language_model.model.layers.12.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
47
+ "language_model.model.layers.12.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
48
+ "language_model.model.layers.12.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
49
+ "language_model.model.layers.12.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
50
+ "language_model.model.layers.12.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
51
+ "language_model.model.layers.12.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
52
+ "language_model.model.layers.12.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
53
+ "language_model.model.layers.13.input_layernorm.weight": "model-00003-of-00006.safetensors",
54
+ "language_model.model.layers.13.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
55
+ "language_model.model.layers.13.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
56
+ "language_model.model.layers.13.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
57
+ "language_model.model.layers.13.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
58
+ "language_model.model.layers.13.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
59
+ "language_model.model.layers.13.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
60
+ "language_model.model.layers.13.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
61
+ "language_model.model.layers.13.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
62
+ "language_model.model.layers.14.input_layernorm.weight": "model-00003-of-00006.safetensors",
63
+ "language_model.model.layers.14.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
64
+ "language_model.model.layers.14.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
65
+ "language_model.model.layers.14.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
66
+ "language_model.model.layers.14.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
67
+ "language_model.model.layers.14.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
68
+ "language_model.model.layers.14.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
69
+ "language_model.model.layers.14.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
70
+ "language_model.model.layers.14.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
71
+ "language_model.model.layers.15.input_layernorm.weight": "model-00003-of-00006.safetensors",
72
+ "language_model.model.layers.15.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
73
+ "language_model.model.layers.15.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
74
+ "language_model.model.layers.15.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
75
+ "language_model.model.layers.15.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
76
+ "language_model.model.layers.15.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
77
+ "language_model.model.layers.15.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
78
+ "language_model.model.layers.15.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
79
+ "language_model.model.layers.15.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
80
+ "language_model.model.layers.16.input_layernorm.weight": "model-00004-of-00006.safetensors",
81
+ "language_model.model.layers.16.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
82
+ "language_model.model.layers.16.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
83
+ "language_model.model.layers.16.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
84
+ "language_model.model.layers.16.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
85
+ "language_model.model.layers.16.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
86
+ "language_model.model.layers.16.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
87
+ "language_model.model.layers.16.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
88
+ "language_model.model.layers.16.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
89
+ "language_model.model.layers.17.input_layernorm.weight": "model-00004-of-00006.safetensors",
90
+ "language_model.model.layers.17.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
91
+ "language_model.model.layers.17.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
92
+ "language_model.model.layers.17.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
93
+ "language_model.model.layers.17.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
94
+ "language_model.model.layers.17.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
95
+ "language_model.model.layers.17.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
96
+ "language_model.model.layers.17.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
97
+ "language_model.model.layers.17.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
98
+ "language_model.model.layers.18.input_layernorm.weight": "model-00004-of-00006.safetensors",
99
+ "language_model.model.layers.18.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
100
+ "language_model.model.layers.18.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
101
+ "language_model.model.layers.18.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
102
+ "language_model.model.layers.18.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
103
+ "language_model.model.layers.18.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
104
+ "language_model.model.layers.18.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
105
+ "language_model.model.layers.18.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
106
+ "language_model.model.layers.18.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
107
+ "language_model.model.layers.19.input_layernorm.weight": "model-00004-of-00006.safetensors",
108
+ "language_model.model.layers.19.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
109
+ "language_model.model.layers.19.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
110
+ "language_model.model.layers.19.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
111
+ "language_model.model.layers.19.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
112
+ "language_model.model.layers.19.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
113
+ "language_model.model.layers.19.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
114
+ "language_model.model.layers.19.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
115
+ "language_model.model.layers.19.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
116
+ "language_model.model.layers.2.input_layernorm.weight": "model-00001-of-00006.safetensors",
117
+ "language_model.model.layers.2.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
118
+ "language_model.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
119
+ "language_model.model.layers.2.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
120
+ "language_model.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
121
+ "language_model.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
122
+ "language_model.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
123
+ "language_model.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
124
+ "language_model.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
125
+ "language_model.model.layers.20.input_layernorm.weight": "model-00004-of-00006.safetensors",
126
+ "language_model.model.layers.20.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
127
+ "language_model.model.layers.20.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
128
+ "language_model.model.layers.20.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
129
+ "language_model.model.layers.20.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
130
+ "language_model.model.layers.20.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
131
+ "language_model.model.layers.20.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
132
+ "language_model.model.layers.20.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
133
+ "language_model.model.layers.20.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
134
+ "language_model.model.layers.21.input_layernorm.weight": "model-00004-of-00006.safetensors",
135
+ "language_model.model.layers.21.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
136
+ "language_model.model.layers.21.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
137
+ "language_model.model.layers.21.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
138
+ "language_model.model.layers.21.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
139
+ "language_model.model.layers.21.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
140
+ "language_model.model.layers.21.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
141
+ "language_model.model.layers.21.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
142
+ "language_model.model.layers.21.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
143
+ "language_model.model.layers.22.input_layernorm.weight": "model-00005-of-00006.safetensors",
144
+ "language_model.model.layers.22.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
145
+ "language_model.model.layers.22.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
146
+ "language_model.model.layers.22.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
147
+ "language_model.model.layers.22.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
148
+ "language_model.model.layers.22.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
149
+ "language_model.model.layers.22.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
150
+ "language_model.model.layers.22.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
151
+ "language_model.model.layers.22.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
152
+ "language_model.model.layers.23.input_layernorm.weight": "model-00005-of-00006.safetensors",
153
+ "language_model.model.layers.23.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
154
+ "language_model.model.layers.23.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
155
+ "language_model.model.layers.23.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
156
+ "language_model.model.layers.23.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
157
+ "language_model.model.layers.23.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
158
+ "language_model.model.layers.23.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
159
+ "language_model.model.layers.23.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
160
+ "language_model.model.layers.23.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
161
+ "language_model.model.layers.24.input_layernorm.weight": "model-00005-of-00006.safetensors",
162
+ "language_model.model.layers.24.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
163
+ "language_model.model.layers.24.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
164
+ "language_model.model.layers.24.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
165
+ "language_model.model.layers.24.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
166
+ "language_model.model.layers.24.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
167
+ "language_model.model.layers.24.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
168
+ "language_model.model.layers.24.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
169
+ "language_model.model.layers.24.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
170
+ "language_model.model.layers.25.input_layernorm.weight": "model-00005-of-00006.safetensors",
171
+ "language_model.model.layers.25.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
172
+ "language_model.model.layers.25.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
173
+ "language_model.model.layers.25.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
174
+ "language_model.model.layers.25.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
175
+ "language_model.model.layers.25.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
176
+ "language_model.model.layers.25.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
177
+ "language_model.model.layers.25.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
178
+ "language_model.model.layers.25.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
179
+ "language_model.model.layers.26.input_layernorm.weight": "model-00005-of-00006.safetensors",
180
+ "language_model.model.layers.26.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
181
+ "language_model.model.layers.26.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
182
+ "language_model.model.layers.26.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
183
+ "language_model.model.layers.26.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
184
+ "language_model.model.layers.26.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
185
+ "language_model.model.layers.26.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
186
+ "language_model.model.layers.26.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
187
+ "language_model.model.layers.26.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
188
+ "language_model.model.layers.27.input_layernorm.weight": "model-00005-of-00006.safetensors",
189
+ "language_model.model.layers.27.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
190
+ "language_model.model.layers.27.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
191
+ "language_model.model.layers.27.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
192
+ "language_model.model.layers.27.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
193
+ "language_model.model.layers.27.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
194
+ "language_model.model.layers.27.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
195
+ "language_model.model.layers.27.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
196
+ "language_model.model.layers.27.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
197
+ "language_model.model.layers.28.input_layernorm.weight": "model-00006-of-00006.safetensors",
198
+ "language_model.model.layers.28.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
199
+ "language_model.model.layers.28.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
200
+ "language_model.model.layers.28.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
201
+ "language_model.model.layers.28.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
202
+ "language_model.model.layers.28.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
203
+ "language_model.model.layers.28.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
204
+ "language_model.model.layers.28.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
205
+ "language_model.model.layers.28.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
206
+ "language_model.model.layers.29.input_layernorm.weight": "model-00006-of-00006.safetensors",
207
+ "language_model.model.layers.29.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
208
+ "language_model.model.layers.29.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
209
+ "language_model.model.layers.29.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
210
+ "language_model.model.layers.29.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
211
+ "language_model.model.layers.29.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
212
+ "language_model.model.layers.29.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
213
+ "language_model.model.layers.29.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
214
+ "language_model.model.layers.29.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
215
+ "language_model.model.layers.3.input_layernorm.weight": "model-00001-of-00006.safetensors",
216
+ "language_model.model.layers.3.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
217
+ "language_model.model.layers.3.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
218
+ "language_model.model.layers.3.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
219
+ "language_model.model.layers.3.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
220
+ "language_model.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
221
+ "language_model.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
222
+ "language_model.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
223
+ "language_model.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
224
+ "language_model.model.layers.30.input_layernorm.weight": "model-00006-of-00006.safetensors",
225
+ "language_model.model.layers.30.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
226
+ "language_model.model.layers.30.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
227
+ "language_model.model.layers.30.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
228
+ "language_model.model.layers.30.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
229
+ "language_model.model.layers.30.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
230
+ "language_model.model.layers.30.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
231
+ "language_model.model.layers.30.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
232
+ "language_model.model.layers.30.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
233
+ "language_model.model.layers.31.input_layernorm.weight": "model-00006-of-00006.safetensors",
234
+ "language_model.model.layers.31.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
235
+ "language_model.model.layers.31.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
236
+ "language_model.model.layers.31.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
237
+ "language_model.model.layers.31.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
238
+ "language_model.model.layers.31.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
239
+ "language_model.model.layers.31.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
240
+ "language_model.model.layers.31.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
241
+ "language_model.model.layers.31.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
242
+ "language_model.model.layers.4.input_layernorm.weight": "model-00002-of-00006.safetensors",
243
+ "language_model.model.layers.4.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
244
+ "language_model.model.layers.4.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
245
+ "language_model.model.layers.4.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
246
+ "language_model.model.layers.4.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
247
+ "language_model.model.layers.4.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
248
+ "language_model.model.layers.4.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
249
+ "language_model.model.layers.4.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
250
+ "language_model.model.layers.4.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
251
+ "language_model.model.layers.5.input_layernorm.weight": "model-00002-of-00006.safetensors",
252
+ "language_model.model.layers.5.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
253
+ "language_model.model.layers.5.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
254
+ "language_model.model.layers.5.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
255
+ "language_model.model.layers.5.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
256
+ "language_model.model.layers.5.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
257
+ "language_model.model.layers.5.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
258
+ "language_model.model.layers.5.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
259
+ "language_model.model.layers.5.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
260
+ "language_model.model.layers.6.input_layernorm.weight": "model-00002-of-00006.safetensors",
261
+ "language_model.model.layers.6.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
262
+ "language_model.model.layers.6.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
263
+ "language_model.model.layers.6.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
264
+ "language_model.model.layers.6.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
265
+ "language_model.model.layers.6.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
266
+ "language_model.model.layers.6.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
267
+ "language_model.model.layers.6.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
268
+ "language_model.model.layers.6.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
269
+ "language_model.model.layers.7.input_layernorm.weight": "model-00002-of-00006.safetensors",
270
+ "language_model.model.layers.7.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
271
+ "language_model.model.layers.7.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
272
+ "language_model.model.layers.7.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
273
+ "language_model.model.layers.7.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
274
+ "language_model.model.layers.7.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
275
+ "language_model.model.layers.7.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
276
+ "language_model.model.layers.7.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
277
+ "language_model.model.layers.7.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
278
+ "language_model.model.layers.8.input_layernorm.weight": "model-00002-of-00006.safetensors",
279
+ "language_model.model.layers.8.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
280
+ "language_model.model.layers.8.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
281
+ "language_model.model.layers.8.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
282
+ "language_model.model.layers.8.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
283
+ "language_model.model.layers.8.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
284
+ "language_model.model.layers.8.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
285
+ "language_model.model.layers.8.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
286
+ "language_model.model.layers.8.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
287
+ "language_model.model.layers.9.input_layernorm.weight": "model-00002-of-00006.safetensors",
288
+ "language_model.model.layers.9.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
289
+ "language_model.model.layers.9.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
290
+ "language_model.model.layers.9.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
291
+ "language_model.model.layers.9.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
292
+ "language_model.model.layers.9.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
293
+ "language_model.model.layers.9.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
294
+ "language_model.model.layers.9.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
295
+ "language_model.model.layers.9.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
296
+ "language_model.model.norm.weight": "model-00006-of-00006.safetensors",
297
+ "multi_modal_projector.layers.0.bias": "model-00001-of-00006.safetensors",
298
+ "multi_modal_projector.layers.0.weight": "model-00001-of-00006.safetensors",
299
+ "multi_modal_projector.layers.2.bias": "model-00001-of-00006.safetensors",
300
+ "multi_modal_projector.layers.2.weight": "model-00001-of-00006.safetensors",
301
+ "multi_modal_projector.layers.4.bias": "model-00001-of-00006.safetensors",
302
+ "multi_modal_projector.layers.4.weight": "model-00001-of-00006.safetensors",
303
+ "multi_modal_projector.layers.6.bias": "model-00001-of-00006.safetensors",
304
+ "multi_modal_projector.layers.6.weight": "model-00001-of-00006.safetensors",
305
+ "vision_tower.embeddings.cls_token": "model-00001-of-00006.safetensors",
306
+ "vision_tower.embeddings.mask_token": "model-00001-of-00006.safetensors",
307
+ "vision_tower.embeddings.patch_embeddings.projection.bias": "model-00001-of-00006.safetensors",
308
+ "vision_tower.embeddings.patch_embeddings.projection.weight": "model-00001-of-00006.safetensors",
309
+ "vision_tower.embeddings.position_embeddings": "model-00001-of-00006.safetensors",
310
+ "vision_tower.encoder.layer.0.attention.attention.key.bias": "model-00001-of-00006.safetensors",
311
+ "vision_tower.encoder.layer.0.attention.attention.key.weight": "model-00001-of-00006.safetensors",
312
+ "vision_tower.encoder.layer.0.attention.attention.query.bias": "model-00001-of-00006.safetensors",
313
+ "vision_tower.encoder.layer.0.attention.attention.query.weight": "model-00001-of-00006.safetensors",
314
+ "vision_tower.encoder.layer.0.attention.attention.value.bias": "model-00001-of-00006.safetensors",
315
+ "vision_tower.encoder.layer.0.attention.attention.value.weight": "model-00001-of-00006.safetensors",
316
+ "vision_tower.encoder.layer.0.attention.output.dense.bias": "model-00001-of-00006.safetensors",
317
+ "vision_tower.encoder.layer.0.attention.output.dense.weight": "model-00001-of-00006.safetensors",
318
+ "vision_tower.encoder.layer.0.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
319
+ "vision_tower.encoder.layer.0.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
320
+ "vision_tower.encoder.layer.0.mlp.fc1.bias": "model-00001-of-00006.safetensors",
321
+ "vision_tower.encoder.layer.0.mlp.fc1.weight": "model-00001-of-00006.safetensors",
322
+ "vision_tower.encoder.layer.0.mlp.fc2.bias": "model-00001-of-00006.safetensors",
323
+ "vision_tower.encoder.layer.0.mlp.fc2.weight": "model-00001-of-00006.safetensors",
324
+ "vision_tower.encoder.layer.0.norm1.bias": "model-00001-of-00006.safetensors",
325
+ "vision_tower.encoder.layer.0.norm1.weight": "model-00001-of-00006.safetensors",
326
+ "vision_tower.encoder.layer.0.norm2.bias": "model-00001-of-00006.safetensors",
327
+ "vision_tower.encoder.layer.0.norm2.weight": "model-00001-of-00006.safetensors",
328
+ "vision_tower.encoder.layer.1.attention.attention.key.bias": "model-00001-of-00006.safetensors",
329
+ "vision_tower.encoder.layer.1.attention.attention.key.weight": "model-00001-of-00006.safetensors",
330
+ "vision_tower.encoder.layer.1.attention.attention.query.bias": "model-00001-of-00006.safetensors",
331
+ "vision_tower.encoder.layer.1.attention.attention.query.weight": "model-00001-of-00006.safetensors",
332
+ "vision_tower.encoder.layer.1.attention.attention.value.bias": "model-00001-of-00006.safetensors",
333
+ "vision_tower.encoder.layer.1.attention.attention.value.weight": "model-00001-of-00006.safetensors",
334
+ "vision_tower.encoder.layer.1.attention.output.dense.bias": "model-00001-of-00006.safetensors",
335
+ "vision_tower.encoder.layer.1.attention.output.dense.weight": "model-00001-of-00006.safetensors",
336
+ "vision_tower.encoder.layer.1.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
337
+ "vision_tower.encoder.layer.1.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
338
+ "vision_tower.encoder.layer.1.mlp.fc1.bias": "model-00001-of-00006.safetensors",
339
+ "vision_tower.encoder.layer.1.mlp.fc1.weight": "model-00001-of-00006.safetensors",
340
+ "vision_tower.encoder.layer.1.mlp.fc2.bias": "model-00001-of-00006.safetensors",
341
+ "vision_tower.encoder.layer.1.mlp.fc2.weight": "model-00001-of-00006.safetensors",
342
+ "vision_tower.encoder.layer.1.norm1.bias": "model-00001-of-00006.safetensors",
343
+ "vision_tower.encoder.layer.1.norm1.weight": "model-00001-of-00006.safetensors",
344
+ "vision_tower.encoder.layer.1.norm2.bias": "model-00001-of-00006.safetensors",
345
+ "vision_tower.encoder.layer.1.norm2.weight": "model-00001-of-00006.safetensors",
346
+ "vision_tower.encoder.layer.10.attention.attention.key.bias": "model-00001-of-00006.safetensors",
347
+ "vision_tower.encoder.layer.10.attention.attention.key.weight": "model-00001-of-00006.safetensors",
348
+ "vision_tower.encoder.layer.10.attention.attention.query.bias": "model-00001-of-00006.safetensors",
349
+ "vision_tower.encoder.layer.10.attention.attention.query.weight": "model-00001-of-00006.safetensors",
350
+ "vision_tower.encoder.layer.10.attention.attention.value.bias": "model-00001-of-00006.safetensors",
351
+ "vision_tower.encoder.layer.10.attention.attention.value.weight": "model-00001-of-00006.safetensors",
352
+ "vision_tower.encoder.layer.10.attention.output.dense.bias": "model-00001-of-00006.safetensors",
353
+ "vision_tower.encoder.layer.10.attention.output.dense.weight": "model-00001-of-00006.safetensors",
354
+ "vision_tower.encoder.layer.10.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
355
+ "vision_tower.encoder.layer.10.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
356
+ "vision_tower.encoder.layer.10.mlp.fc1.bias": "model-00001-of-00006.safetensors",
357
+ "vision_tower.encoder.layer.10.mlp.fc1.weight": "model-00001-of-00006.safetensors",
358
+ "vision_tower.encoder.layer.10.mlp.fc2.bias": "model-00001-of-00006.safetensors",
359
+ "vision_tower.encoder.layer.10.mlp.fc2.weight": "model-00001-of-00006.safetensors",
360
+ "vision_tower.encoder.layer.10.norm1.bias": "model-00001-of-00006.safetensors",
361
+ "vision_tower.encoder.layer.10.norm1.weight": "model-00001-of-00006.safetensors",
362
+ "vision_tower.encoder.layer.10.norm2.bias": "model-00001-of-00006.safetensors",
363
+ "vision_tower.encoder.layer.10.norm2.weight": "model-00001-of-00006.safetensors",
364
+ "vision_tower.encoder.layer.11.attention.attention.key.bias": "model-00001-of-00006.safetensors",
365
+ "vision_tower.encoder.layer.11.attention.attention.key.weight": "model-00001-of-00006.safetensors",
366
+ "vision_tower.encoder.layer.11.attention.attention.query.bias": "model-00001-of-00006.safetensors",
367
+ "vision_tower.encoder.layer.11.attention.attention.query.weight": "model-00001-of-00006.safetensors",
368
+ "vision_tower.encoder.layer.11.attention.attention.value.bias": "model-00001-of-00006.safetensors",
369
+ "vision_tower.encoder.layer.11.attention.attention.value.weight": "model-00001-of-00006.safetensors",
370
+ "vision_tower.encoder.layer.11.attention.output.dense.bias": "model-00001-of-00006.safetensors",
371
+ "vision_tower.encoder.layer.11.attention.output.dense.weight": "model-00001-of-00006.safetensors",
372
+ "vision_tower.encoder.layer.11.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
373
+ "vision_tower.encoder.layer.11.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
374
+ "vision_tower.encoder.layer.11.mlp.fc1.bias": "model-00001-of-00006.safetensors",
375
+ "vision_tower.encoder.layer.11.mlp.fc1.weight": "model-00001-of-00006.safetensors",
376
+ "vision_tower.encoder.layer.11.mlp.fc2.bias": "model-00001-of-00006.safetensors",
377
+ "vision_tower.encoder.layer.11.mlp.fc2.weight": "model-00001-of-00006.safetensors",
378
+ "vision_tower.encoder.layer.11.norm1.bias": "model-00001-of-00006.safetensors",
379
+ "vision_tower.encoder.layer.11.norm1.weight": "model-00001-of-00006.safetensors",
380
+ "vision_tower.encoder.layer.11.norm2.bias": "model-00001-of-00006.safetensors",
381
+ "vision_tower.encoder.layer.11.norm2.weight": "model-00001-of-00006.safetensors",
382
+ "vision_tower.encoder.layer.2.attention.attention.key.bias": "model-00001-of-00006.safetensors",
383
+ "vision_tower.encoder.layer.2.attention.attention.key.weight": "model-00001-of-00006.safetensors",
384
+ "vision_tower.encoder.layer.2.attention.attention.query.bias": "model-00001-of-00006.safetensors",
385
+ "vision_tower.encoder.layer.2.attention.attention.query.weight": "model-00001-of-00006.safetensors",
386
+ "vision_tower.encoder.layer.2.attention.attention.value.bias": "model-00001-of-00006.safetensors",
387
+ "vision_tower.encoder.layer.2.attention.attention.value.weight": "model-00001-of-00006.safetensors",
388
+ "vision_tower.encoder.layer.2.attention.output.dense.bias": "model-00001-of-00006.safetensors",
389
+ "vision_tower.encoder.layer.2.attention.output.dense.weight": "model-00001-of-00006.safetensors",
390
+ "vision_tower.encoder.layer.2.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
391
+ "vision_tower.encoder.layer.2.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
392
+ "vision_tower.encoder.layer.2.mlp.fc1.bias": "model-00001-of-00006.safetensors",
393
+ "vision_tower.encoder.layer.2.mlp.fc1.weight": "model-00001-of-00006.safetensors",
394
+ "vision_tower.encoder.layer.2.mlp.fc2.bias": "model-00001-of-00006.safetensors",
395
+ "vision_tower.encoder.layer.2.mlp.fc2.weight": "model-00001-of-00006.safetensors",
396
+ "vision_tower.encoder.layer.2.norm1.bias": "model-00001-of-00006.safetensors",
397
+ "vision_tower.encoder.layer.2.norm1.weight": "model-00001-of-00006.safetensors",
398
+ "vision_tower.encoder.layer.2.norm2.bias": "model-00001-of-00006.safetensors",
399
+ "vision_tower.encoder.layer.2.norm2.weight": "model-00001-of-00006.safetensors",
400
+ "vision_tower.encoder.layer.3.attention.attention.key.bias": "model-00001-of-00006.safetensors",
401
+ "vision_tower.encoder.layer.3.attention.attention.key.weight": "model-00001-of-00006.safetensors",
402
+ "vision_tower.encoder.layer.3.attention.attention.query.bias": "model-00001-of-00006.safetensors",
403
+ "vision_tower.encoder.layer.3.attention.attention.query.weight": "model-00001-of-00006.safetensors",
404
+ "vision_tower.encoder.layer.3.attention.attention.value.bias": "model-00001-of-00006.safetensors",
405
+ "vision_tower.encoder.layer.3.attention.attention.value.weight": "model-00001-of-00006.safetensors",
406
+ "vision_tower.encoder.layer.3.attention.output.dense.bias": "model-00001-of-00006.safetensors",
407
+ "vision_tower.encoder.layer.3.attention.output.dense.weight": "model-00001-of-00006.safetensors",
408
+ "vision_tower.encoder.layer.3.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
409
+ "vision_tower.encoder.layer.3.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
410
+ "vision_tower.encoder.layer.3.mlp.fc1.bias": "model-00001-of-00006.safetensors",
411
+ "vision_tower.encoder.layer.3.mlp.fc1.weight": "model-00001-of-00006.safetensors",
412
+ "vision_tower.encoder.layer.3.mlp.fc2.bias": "model-00001-of-00006.safetensors",
413
+ "vision_tower.encoder.layer.3.mlp.fc2.weight": "model-00001-of-00006.safetensors",
414
+ "vision_tower.encoder.layer.3.norm1.bias": "model-00001-of-00006.safetensors",
415
+ "vision_tower.encoder.layer.3.norm1.weight": "model-00001-of-00006.safetensors",
416
+ "vision_tower.encoder.layer.3.norm2.bias": "model-00001-of-00006.safetensors",
417
+ "vision_tower.encoder.layer.3.norm2.weight": "model-00001-of-00006.safetensors",
418
+ "vision_tower.encoder.layer.4.attention.attention.key.bias": "model-00001-of-00006.safetensors",
419
+ "vision_tower.encoder.layer.4.attention.attention.key.weight": "model-00001-of-00006.safetensors",
420
+ "vision_tower.encoder.layer.4.attention.attention.query.bias": "model-00001-of-00006.safetensors",
421
+ "vision_tower.encoder.layer.4.attention.attention.query.weight": "model-00001-of-00006.safetensors",
422
+ "vision_tower.encoder.layer.4.attention.attention.value.bias": "model-00001-of-00006.safetensors",
423
+ "vision_tower.encoder.layer.4.attention.attention.value.weight": "model-00001-of-00006.safetensors",
424
+ "vision_tower.encoder.layer.4.attention.output.dense.bias": "model-00001-of-00006.safetensors",
425
+ "vision_tower.encoder.layer.4.attention.output.dense.weight": "model-00001-of-00006.safetensors",
426
+ "vision_tower.encoder.layer.4.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
427
+ "vision_tower.encoder.layer.4.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
428
+ "vision_tower.encoder.layer.4.mlp.fc1.bias": "model-00001-of-00006.safetensors",
429
+ "vision_tower.encoder.layer.4.mlp.fc1.weight": "model-00001-of-00006.safetensors",
430
+ "vision_tower.encoder.layer.4.mlp.fc2.bias": "model-00001-of-00006.safetensors",
431
+ "vision_tower.encoder.layer.4.mlp.fc2.weight": "model-00001-of-00006.safetensors",
432
+ "vision_tower.encoder.layer.4.norm1.bias": "model-00001-of-00006.safetensors",
433
+ "vision_tower.encoder.layer.4.norm1.weight": "model-00001-of-00006.safetensors",
434
+ "vision_tower.encoder.layer.4.norm2.bias": "model-00001-of-00006.safetensors",
435
+ "vision_tower.encoder.layer.4.norm2.weight": "model-00001-of-00006.safetensors",
436
+ "vision_tower.encoder.layer.5.attention.attention.key.bias": "model-00001-of-00006.safetensors",
437
+ "vision_tower.encoder.layer.5.attention.attention.key.weight": "model-00001-of-00006.safetensors",
438
+ "vision_tower.encoder.layer.5.attention.attention.query.bias": "model-00001-of-00006.safetensors",
439
+ "vision_tower.encoder.layer.5.attention.attention.query.weight": "model-00001-of-00006.safetensors",
440
+ "vision_tower.encoder.layer.5.attention.attention.value.bias": "model-00001-of-00006.safetensors",
441
+ "vision_tower.encoder.layer.5.attention.attention.value.weight": "model-00001-of-00006.safetensors",
442
+ "vision_tower.encoder.layer.5.attention.output.dense.bias": "model-00001-of-00006.safetensors",
443
+ "vision_tower.encoder.layer.5.attention.output.dense.weight": "model-00001-of-00006.safetensors",
444
+ "vision_tower.encoder.layer.5.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
445
+ "vision_tower.encoder.layer.5.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
446
+ "vision_tower.encoder.layer.5.mlp.fc1.bias": "model-00001-of-00006.safetensors",
447
+ "vision_tower.encoder.layer.5.mlp.fc1.weight": "model-00001-of-00006.safetensors",
448
+ "vision_tower.encoder.layer.5.mlp.fc2.bias": "model-00001-of-00006.safetensors",
449
+ "vision_tower.encoder.layer.5.mlp.fc2.weight": "model-00001-of-00006.safetensors",
450
+ "vision_tower.encoder.layer.5.norm1.bias": "model-00001-of-00006.safetensors",
451
+ "vision_tower.encoder.layer.5.norm1.weight": "model-00001-of-00006.safetensors",
452
+ "vision_tower.encoder.layer.5.norm2.bias": "model-00001-of-00006.safetensors",
453
+ "vision_tower.encoder.layer.5.norm2.weight": "model-00001-of-00006.safetensors",
454
+ "vision_tower.encoder.layer.6.attention.attention.key.bias": "model-00001-of-00006.safetensors",
455
+ "vision_tower.encoder.layer.6.attention.attention.key.weight": "model-00001-of-00006.safetensors",
456
+ "vision_tower.encoder.layer.6.attention.attention.query.bias": "model-00001-of-00006.safetensors",
457
+ "vision_tower.encoder.layer.6.attention.attention.query.weight": "model-00001-of-00006.safetensors",
458
+ "vision_tower.encoder.layer.6.attention.attention.value.bias": "model-00001-of-00006.safetensors",
459
+ "vision_tower.encoder.layer.6.attention.attention.value.weight": "model-00001-of-00006.safetensors",
460
+ "vision_tower.encoder.layer.6.attention.output.dense.bias": "model-00001-of-00006.safetensors",
461
+ "vision_tower.encoder.layer.6.attention.output.dense.weight": "model-00001-of-00006.safetensors",
462
+ "vision_tower.encoder.layer.6.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
463
+ "vision_tower.encoder.layer.6.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
464
+ "vision_tower.encoder.layer.6.mlp.fc1.bias": "model-00001-of-00006.safetensors",
465
+ "vision_tower.encoder.layer.6.mlp.fc1.weight": "model-00001-of-00006.safetensors",
466
+ "vision_tower.encoder.layer.6.mlp.fc2.bias": "model-00001-of-00006.safetensors",
467
+ "vision_tower.encoder.layer.6.mlp.fc2.weight": "model-00001-of-00006.safetensors",
468
+ "vision_tower.encoder.layer.6.norm1.bias": "model-00001-of-00006.safetensors",
469
+ "vision_tower.encoder.layer.6.norm1.weight": "model-00001-of-00006.safetensors",
470
+ "vision_tower.encoder.layer.6.norm2.bias": "model-00001-of-00006.safetensors",
471
+ "vision_tower.encoder.layer.6.norm2.weight": "model-00001-of-00006.safetensors",
472
+ "vision_tower.encoder.layer.7.attention.attention.key.bias": "model-00001-of-00006.safetensors",
473
+ "vision_tower.encoder.layer.7.attention.attention.key.weight": "model-00001-of-00006.safetensors",
474
+ "vision_tower.encoder.layer.7.attention.attention.query.bias": "model-00001-of-00006.safetensors",
475
+ "vision_tower.encoder.layer.7.attention.attention.query.weight": "model-00001-of-00006.safetensors",
476
+ "vision_tower.encoder.layer.7.attention.attention.value.bias": "model-00001-of-00006.safetensors",
477
+ "vision_tower.encoder.layer.7.attention.attention.value.weight": "model-00001-of-00006.safetensors",
478
+ "vision_tower.encoder.layer.7.attention.output.dense.bias": "model-00001-of-00006.safetensors",
479
+ "vision_tower.encoder.layer.7.attention.output.dense.weight": "model-00001-of-00006.safetensors",
480
+ "vision_tower.encoder.layer.7.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
481
+ "vision_tower.encoder.layer.7.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
482
+ "vision_tower.encoder.layer.7.mlp.fc1.bias": "model-00001-of-00006.safetensors",
483
+ "vision_tower.encoder.layer.7.mlp.fc1.weight": "model-00001-of-00006.safetensors",
484
+ "vision_tower.encoder.layer.7.mlp.fc2.bias": "model-00001-of-00006.safetensors",
485
+ "vision_tower.encoder.layer.7.mlp.fc2.weight": "model-00001-of-00006.safetensors",
486
+ "vision_tower.encoder.layer.7.norm1.bias": "model-00001-of-00006.safetensors",
487
+ "vision_tower.encoder.layer.7.norm1.weight": "model-00001-of-00006.safetensors",
488
+ "vision_tower.encoder.layer.7.norm2.bias": "model-00001-of-00006.safetensors",
489
+ "vision_tower.encoder.layer.7.norm2.weight": "model-00001-of-00006.safetensors",
490
+ "vision_tower.encoder.layer.8.attention.attention.key.bias": "model-00001-of-00006.safetensors",
491
+ "vision_tower.encoder.layer.8.attention.attention.key.weight": "model-00001-of-00006.safetensors",
492
+ "vision_tower.encoder.layer.8.attention.attention.query.bias": "model-00001-of-00006.safetensors",
493
+ "vision_tower.encoder.layer.8.attention.attention.query.weight": "model-00001-of-00006.safetensors",
494
+ "vision_tower.encoder.layer.8.attention.attention.value.bias": "model-00001-of-00006.safetensors",
495
+ "vision_tower.encoder.layer.8.attention.attention.value.weight": "model-00001-of-00006.safetensors",
496
+ "vision_tower.encoder.layer.8.attention.output.dense.bias": "model-00001-of-00006.safetensors",
497
+ "vision_tower.encoder.layer.8.attention.output.dense.weight": "model-00001-of-00006.safetensors",
498
+ "vision_tower.encoder.layer.8.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
499
+ "vision_tower.encoder.layer.8.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
500
+ "vision_tower.encoder.layer.8.mlp.fc1.bias": "model-00001-of-00006.safetensors",
501
+ "vision_tower.encoder.layer.8.mlp.fc1.weight": "model-00001-of-00006.safetensors",
502
+ "vision_tower.encoder.layer.8.mlp.fc2.bias": "model-00001-of-00006.safetensors",
503
+ "vision_tower.encoder.layer.8.mlp.fc2.weight": "model-00001-of-00006.safetensors",
504
+ "vision_tower.encoder.layer.8.norm1.bias": "model-00001-of-00006.safetensors",
505
+ "vision_tower.encoder.layer.8.norm1.weight": "model-00001-of-00006.safetensors",
506
+ "vision_tower.encoder.layer.8.norm2.bias": "model-00001-of-00006.safetensors",
507
+ "vision_tower.encoder.layer.8.norm2.weight": "model-00001-of-00006.safetensors",
508
+ "vision_tower.encoder.layer.9.attention.attention.key.bias": "model-00001-of-00006.safetensors",
509
+ "vision_tower.encoder.layer.9.attention.attention.key.weight": "model-00001-of-00006.safetensors",
510
+ "vision_tower.encoder.layer.9.attention.attention.query.bias": "model-00001-of-00006.safetensors",
511
+ "vision_tower.encoder.layer.9.attention.attention.query.weight": "model-00001-of-00006.safetensors",
512
+ "vision_tower.encoder.layer.9.attention.attention.value.bias": "model-00001-of-00006.safetensors",
513
+ "vision_tower.encoder.layer.9.attention.attention.value.weight": "model-00001-of-00006.safetensors",
514
+ "vision_tower.encoder.layer.9.attention.output.dense.bias": "model-00001-of-00006.safetensors",
515
+ "vision_tower.encoder.layer.9.attention.output.dense.weight": "model-00001-of-00006.safetensors",
516
+ "vision_tower.encoder.layer.9.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
517
+ "vision_tower.encoder.layer.9.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
518
+ "vision_tower.encoder.layer.9.mlp.fc1.bias": "model-00001-of-00006.safetensors",
519
+ "vision_tower.encoder.layer.9.mlp.fc1.weight": "model-00001-of-00006.safetensors",
520
+ "vision_tower.encoder.layer.9.mlp.fc2.bias": "model-00001-of-00006.safetensors",
521
+ "vision_tower.encoder.layer.9.mlp.fc2.weight": "model-00001-of-00006.safetensors",
522
+ "vision_tower.encoder.layer.9.norm1.bias": "model-00001-of-00006.safetensors",
523
+ "vision_tower.encoder.layer.9.norm1.weight": "model-00001-of-00006.safetensors",
524
+ "vision_tower.encoder.layer.9.norm2.bias": "model-00001-of-00006.safetensors",
525
+ "vision_tower.encoder.layer.9.norm2.weight": "model-00001-of-00006.safetensors",
526
+ "vision_tower.layernorm.bias": "model-00001-of-00006.safetensors",
527
+ "vision_tower.layernorm.weight": "model-00001-of-00006.safetensors"
528
+ }
529
+ }
modeling_maira2.py ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2024 Microsoft. All rights reserved.
2
+ # Licensed under the MSRLA License. See LICENSE in the repo root for license information.
3
+
4
+
5
+ import torch
6
+ from torch.nn import Linear, Module, Sequential
7
+ from transformers import AutoBackbone, AutoModelForCausalLM, LlavaForConditionalGeneration, LlavaPreTrainedModel
8
+ from transformers.activations import ACT2FN
9
+ from transformers.utils import check_min_version
10
+
11
+ from .configuration_maira2 import Maira2Config
12
+
13
+
14
+ class Maira2MultiModalProjector(Module):
15
+ """
16
+ This class implements the multimodal projector for MAIRA-2 model. It projects the image features to the text
17
+ hidden size via a series of linear layers (4 layers in MAIRA-2).
18
+ """
19
+
20
+ def __init__(self, config: Maira2Config):
21
+ super().__init__()
22
+
23
+ n_layers = config.projector_n_layers
24
+ if n_layers < 1:
25
+ raise ValueError(f"Number of layers should be at least 1, got {n_layers=}")
26
+ text_hidden_size = config.text_config.hidden_size
27
+ vision_hidden_size = config.vision_config.hidden_size
28
+ _layers = [Linear(vision_hidden_size, text_hidden_size, bias=True)]
29
+ for _ in range(n_layers - 1):
30
+ _layers.append(ACT2FN[config.projector_hidden_act])
31
+ _layers.append(Linear(text_hidden_size, text_hidden_size, bias=True))
32
+
33
+ self.layers = Sequential(*_layers)
34
+
35
+ def forward(self, image_features: torch.Tensor) -> torch.FloatTensor:
36
+ hidden_states = self.layers(image_features)
37
+ return hidden_states # type: ignore[no-any-return]
38
+
39
+
40
+ class Maira2ForConditionalGeneration(LlavaForConditionalGeneration):
41
+ """
42
+ This model implements the multimodal model MAIRA-2. It consists of a vision backbone, a multimodal projector, and a
43
+ language model. The model can be used for grounded and ungrounded report generation tasks as well as phrase grounding.
44
+ This class inherits from `LlavaForConditionalGeneration`, defining a custom multimodal projector and changing image
45
+ feature selection.
46
+ """
47
+
48
+ config_class = Maira2Config
49
+
50
+ def __init__(self, config: Maira2Config) -> None:
51
+
52
+ # Check transformers version is at least 4.46.0.dev0 otherwise the model fails
53
+ # silently since get_image_features is not called in the forward pass
54
+ check_min_version("4.46.0.dev0")
55
+
56
+ super(LlavaPreTrainedModel, self).__init__(config)
57
+ self.vision_tower = AutoBackbone.from_config(config.vision_config)
58
+
59
+ self.multi_modal_projector = Maira2MultiModalProjector(config)
60
+ self.vocab_size = config.text_config.vocab_size
61
+ self.language_model = AutoModelForCausalLM.from_config(
62
+ config.text_config,
63
+ attn_implementation=config._attn_implementation,
64
+ )
65
+ self.pad_token_id = self.config.pad_token_id if self.config.pad_token_id is not None else -1
66
+ self.post_init()
67
+
68
+ def get_image_features(
69
+ self, pixel_values: torch.FloatTensor, vision_feature_layer: int, vision_feature_select_strategy: str
70
+ ) -> torch.Tensor:
71
+ """
72
+ This method extracts the image features from the vision backbone using the specified feature layer and
73
+ selection strategy. This is custom to MAIRA-2 model since we want to use the `feature_maps` from the Dinov2Backbone
74
+ class instead of the `hidden_states` which are used in the default implementation of `get_image_features` in LlavaForConditionalGeneration.
75
+ The feature_maps returned by Dinov2Backbone are the hideen_states with a layernorm applied to them.
76
+ """
77
+ image_outputs = self.vision_tower(pixel_values, output_hidden_states=True)
78
+ selected_image_feature = image_outputs.feature_maps[vision_feature_layer]
79
+
80
+ if vision_feature_select_strategy == "default":
81
+ selected_image_feature = selected_image_feature[:, 1:]
82
+ elif vision_feature_select_strategy == "full":
83
+ selected_image_feature = selected_image_feature
84
+ else:
85
+ raise ValueError(f"Unexpected select feature strategy: {self.config.vision_feature_select_strategy}")
86
+
87
+ image_features = self.multi_modal_projector(selected_image_feature)
88
+ return image_features # type: ignore[no-any-return]
preprocessor_config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": {
3
+ "height": 518,
4
+ "width": 518
5
+ },
6
+ "do_center_crop": true,
7
+ "do_convert_rgb": true,
8
+ "do_normalize": true,
9
+ "do_rescale": true,
10
+ "do_resize": true,
11
+ "image_mean": [
12
+ 0.5307,
13
+ 0.5307,
14
+ 0.5307
15
+ ],
16
+ "image_processor_type": "BitImageProcessor",
17
+ "image_std": [
18
+ 0.2583,
19
+ 0.2583,
20
+ 0.2583
21
+ ],
22
+ "processor_class": "Maira2Processor",
23
+ "resample": 3,
24
+ "rescale_factor": 0.00392156862745098,
25
+ "size": {
26
+ "shortest_edge": 518
27
+ }
28
+ }
processing_maira2.py ADDED
@@ -0,0 +1,646 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2024 Microsoft. All rights reserved.
2
+ # Licensed under the MSRLA License. See LICENSE in the repo root for license information.
3
+
4
+
5
+ import re
6
+ from typing import Any, TypeAlias
7
+
8
+ import numpy as np
9
+ from PIL import Image
10
+ from transformers import BaseImageProcessor, LlavaProcessor, PreTrainedTokenizer
11
+ from transformers.feature_extraction_utils import BatchFeature
12
+
13
+ SingleChatMessageType: TypeAlias = dict[str, str | int | None]
14
+ ChatMessageListType: TypeAlias = list[dict[str, str | list[SingleChatMessageType]]]
15
+ BoxType: TypeAlias = tuple[float, float, float, float]
16
+
17
+
18
+ class Maira2Processor(LlavaProcessor):
19
+ """
20
+ Constructs a Maira2 processor similar to LlavaProcessor but with additional arguments and functions to support
21
+ multi-image grounded and non-grounded radiology report generation.
22
+
23
+ In addition to the arguments of LlavaProcessor, Maira2Processor has the following extra arguments:
24
+
25
+ Args:
26
+ phrase_start_token (`str`, *optional*, defaults to `"<obj>"`):
27
+ Special token used to denote the start of a grounded phrase (with or without box).
28
+ phrase_end_token (`str`, *optional*, defaults to `"</obj>"`):
29
+ Special token used to denote the end of a grounded phrase.
30
+ box_start_token (`str`, *optional*, defaults to `"<box>"`):
31
+ Special token used to denote the start of a bounding box.
32
+ box_end_token (`str`, *optional*, defaults to `"</box>"`):
33
+ Special token used to denote the end of a bounding box.
34
+ num_box_coord_bins (`int`, *optional*, defaults to `100`):
35
+ Number of bins used to represent the bounding box coordinates.
36
+ """
37
+
38
+ valid_kwargs = [
39
+ "chat_template",
40
+ "patch_size",
41
+ "vision_feature_select_strategy",
42
+ "image_token",
43
+ "phrase_start_token",
44
+ "phrase_end_token",
45
+ "box_start_token",
46
+ "box_end_token",
47
+ "num_box_coord_bins",
48
+ ]
49
+
50
+ def __init__(
51
+ self,
52
+ image_processor: BaseImageProcessor = None,
53
+ tokenizer: PreTrainedTokenizer = None,
54
+ patch_size: int | None = None,
55
+ vision_feature_select_strategy: str | None = None,
56
+ chat_template: str | None = None,
57
+ image_token: str = "<image>",
58
+ phrase_start_token: str = "<obj>",
59
+ phrase_end_token: str = "</obj>",
60
+ box_start_token: str = "<box>",
61
+ box_end_token: str = "</box>",
62
+ num_box_coord_bins: int = 100,
63
+ **kwargs: Any,
64
+ ) -> None:
65
+ super().__init__(
66
+ image_processor=image_processor,
67
+ tokenizer=tokenizer,
68
+ patch_size=patch_size,
69
+ vision_feature_select_strategy=vision_feature_select_strategy,
70
+ chat_template=chat_template,
71
+ image_token=image_token,
72
+ **kwargs,
73
+ )
74
+
75
+ self.phrase_start_token = phrase_start_token
76
+ self.phrase_end_token = phrase_end_token
77
+ self.box_start_token = box_start_token
78
+ self.box_end_token = box_end_token
79
+ self.num_box_coord_bins = num_box_coord_bins
80
+
81
+ @staticmethod
82
+ def _normalize_image(image: Image.Image) -> Image.Image:
83
+ """
84
+ This function normalizes the input image to have pixel values in the range [0, 255].
85
+
86
+ Args:
87
+ image (Image.Image | np.ndarray):
88
+ The input image to be normalized.
89
+
90
+ Returns:
91
+ Image.Image: The normalized image in grayscale.
92
+ """
93
+ image_np = np.array(image.convert("L"))
94
+ image_np = image_np.astype(float)
95
+ image_np -= image_np.min()
96
+ image_np /= image_np.max()
97
+ image_np *= 255
98
+ image_np = image_np.astype(np.uint8)
99
+
100
+ return Image.fromarray(image_np).convert("L")
101
+
102
+ def _normalize_and_stack_images(
103
+ self,
104
+ current_frontal: Image.Image,
105
+ current_lateral: Image.Image | None,
106
+ prior_frontal: Image.Image | None,
107
+ ) -> list[Image.Image]:
108
+ """
109
+ This function normalizes the input images and stacks them together. The images are stacked in the order of
110
+ current_frontal, current_lateral, and prior_frontal. The order of images is important, since it must match the
111
+ order of the images in the prompt, which is frontal, then lateral then prior.
112
+
113
+ Args:
114
+ current_frontal (Image.Image):
115
+ The current frontal image.
116
+ current_lateral (Image.Image | None):
117
+ The current lateral image.
118
+ prior_frontal (Image.Image | None):
119
+ The prior frontal image.
120
+
121
+ Returns:
122
+ list[Image.Image]: The normalized images stacked together.
123
+ """
124
+ images = [self._normalize_image(current_frontal)]
125
+ if current_lateral is not None:
126
+ images.append(self._normalize_image(current_lateral))
127
+ if prior_frontal is not None:
128
+ images.append(self._normalize_image(prior_frontal))
129
+ return images
130
+
131
+ @staticmethod
132
+ def _get_section_text_or_missing_text(section: str | None) -> str:
133
+ """
134
+ This function returns the input section text if it is not None and not empty, otherwise it returns a missing
135
+ section text "N/A".
136
+
137
+ Args:
138
+ section (str | None):
139
+ The input section text.
140
+
141
+ Returns:
142
+ str: The section text if it is not None and not empty, otherwise "N/A".
143
+ """
144
+ missing_section_text = "N/A"
145
+ if not isinstance(section, str) or len(section) == 0:
146
+ return missing_section_text
147
+ return section
148
+
149
+ @staticmethod
150
+ def _construct_image_chat_messages_for_reporting(has_prior: bool, has_lateral: bool) -> list[SingleChatMessageType]:
151
+ """
152
+ This function constructs user chat messages based on the presence of the prior and lateral images.
153
+
154
+ Args:
155
+ has_prior (bool):
156
+ A boolean indicating whether the prior image is present.
157
+ has_lateral (bool):
158
+ A boolean indicating whether the lateral image is present.
159
+
160
+ Returns:
161
+ list[SingleChatMessageType]: The image prompt messages in the form of a list of dictionaries.
162
+
163
+ Example:
164
+
165
+ ```python
166
+ >>> _construct_image_chat_messages_for_reporting(has_prior=True, has_lateral=True)
167
+ >>> # [
168
+ >>> # {"index": None, "text": "Given the current frontal image", "type": "text"},
169
+ >>> # {"index": 0, "text": None, "type": "image"},
170
+ >>> # {"index": None, "text": " the current lateral image", "type": "text"},
171
+ >>> # {"index": 1, "text": None, "type": "image"},
172
+ >>> # {"index": None, "text": " and the prior frontal image", "type": "text"},
173
+ >>> # {"index": 2, "text": None, "type": "image"},
174
+ >>> # ]
175
+ ```
176
+ """
177
+
178
+ def _add_single_image_to_chat_messages(prompt_text: str, image_index: int) -> None:
179
+ image_prompt.extend(
180
+ [
181
+ {"index": None, "text": prompt_text, "type": "text"},
182
+ {"index": image_index, "text": None, "type": "image"},
183
+ ]
184
+ )
185
+
186
+ image_prompt: list[SingleChatMessageType] = []
187
+ image_index = 0
188
+ if not has_prior and not has_lateral:
189
+ _add_single_image_to_chat_messages("Given the current frontal image only", image_index)
190
+ else:
191
+ _add_single_image_to_chat_messages("Given the current frontal image", image_index)
192
+ image_index += 1
193
+ if has_prior:
194
+ if has_lateral:
195
+ _add_single_image_to_chat_messages(" the current lateral image", image_index)
196
+ image_index += 1
197
+ _add_single_image_to_chat_messages(" and the prior frontal image", image_index)
198
+ else:
199
+ if has_lateral:
200
+ _add_single_image_to_chat_messages(" and the current lateral image", image_index)
201
+ return image_prompt
202
+
203
+ def _construct_chat_messages_reporting(
204
+ self,
205
+ has_prior: bool,
206
+ has_lateral: bool,
207
+ indication: str | None,
208
+ technique: str | None,
209
+ comparison: str | None,
210
+ prior_report: str | None,
211
+ get_grounding: bool = False,
212
+ assistant_text: str | None = None,
213
+ ) -> ChatMessageListType:
214
+ """
215
+ This function constructs the chat messages for reporting used in the grounded and non-grounded reporting tasks.
216
+
217
+ Args:
218
+ has_prior (bool):
219
+ A boolean indicating whether the prior image is present.
220
+ has_lateral (bool):
221
+ A boolean indicating whether the lateral image is present.
222
+ indication (str | None):
223
+ The indication section text.
224
+ technique (str | None):
225
+ The technique section text.
226
+ comparison (str | None):
227
+ The comparison section text.
228
+ prior_report (str | None):
229
+ The prior report section text.
230
+ get_grounding (bool):
231
+ A boolean indicating whether to get the grounding information.
232
+ assistant_text (str | None):
233
+ The assistant text (can be set to None for ordinary inference).
234
+
235
+ Returns:
236
+ ChatMessageListType: The chat messages for reporting in the form of a list of dictionaries.
237
+
238
+ Example:
239
+
240
+ ```python
241
+ >>> _construct_chat_messages_reporting(
242
+ >>> has_prior=True,
243
+ >>> has_lateral=True,
244
+ >>> indication="indication text from report goes here",
245
+ >>> technique="technique text from report goes here",
246
+ >>> comparison="comparison text from report goes here",
247
+ >>> prior_report="prior reporting text goes here",
248
+ >>> get_grounding=False,
249
+ >>> assistant_text=None,
250
+ >>> )
251
+ >>> # [
252
+ >>> # {"index": None, "text": "Given the current frontal image", "type": "text"},
253
+ >>> # {"index": 0, "text": None, "type": "image"},
254
+ >>> # {"index": None, "text": " the current lateral image", "type": "text"},
255
+ >>> # {"index": 1, "text": None, "type": "image"},
256
+ >>> # {"index": None, "text": " and the prior frontal image", "type": "text"},
257
+ >>> # {"index": 2, "text": None, "type": "image"},
258
+ >>> # {"index": None, "text": " PRIOR_REPORT: prior reporting text goes here", "type": "text"},
259
+ >>> # {"index": None, "text": " Provide a description of the findings in the radiology study in comparison to the "
260
+ >>> # "prior frontal image. INDICATION: indication text from report goes here TECHNIQUE: technique text from report "
261
+ >>> # "goes here COMPARISON: comparison text from report goes here", "type": "text"},
262
+ >>> # ]
263
+ ```
264
+ """
265
+ indication = self._get_section_text_or_missing_text(indication)
266
+ technique = self._get_section_text_or_missing_text(technique)
267
+ comparison = self._get_section_text_or_missing_text(comparison)
268
+ prior_report = self._get_section_text_or_missing_text(prior_report)
269
+
270
+ prompt = self._construct_image_chat_messages_for_reporting(has_prior=has_prior, has_lateral=has_lateral)
271
+
272
+ if has_prior:
273
+ prompt.append({"index": None, "text": f" PRIOR_REPORT: {prior_report}", "type": "text"})
274
+
275
+ if get_grounding:
276
+ prompt.append(
277
+ {
278
+ "index": None,
279
+ "text": " Provide a description of the findings in the radiology study in comparison to the "
280
+ "prior frontal image. Each finding should be described as a self-contained plain-text sentence."
281
+ " If the finding is groundable, locate the finding in the current frontal chest X-ray image, "
282
+ "with bounding boxes indicating all locations where it can be seen in the current frontal "
283
+ "image. Otherwise, generate just the ungrounded finding without bounding boxes. INDICATION: "
284
+ f"{indication} TECHNIQUE: {technique} COMPARISON: {comparison}",
285
+ "type": "text",
286
+ }
287
+ )
288
+ else:
289
+ prompt.append(
290
+ {
291
+ "index": None,
292
+ "text": " Provide a description of the findings in the radiology study in comparison to the "
293
+ f"prior frontal image. INDICATION: {indication} TECHNIQUE: {technique} COMPARISON: "
294
+ f"{comparison}",
295
+ "type": "text",
296
+ }
297
+ )
298
+ messages: ChatMessageListType = [{"content": prompt, "role": "user"}]
299
+ if assistant_text is not None:
300
+ messages.append({"content": [{"index": None, "text": assistant_text, "type": "text"}], "role": "assistant"})
301
+ return messages
302
+
303
+ def _construct_chat_messages_phrase_grounding(
304
+ self, phrase: str, assistant_text: str | None = None
305
+ ) -> ChatMessageListType:
306
+ """
307
+ This function constructs the chat messages for phrase grounding used in the phrase grounding task.
308
+
309
+ Args:
310
+ phrase (str):
311
+ The phrase to be grounded.
312
+ assistant_text (str | None):
313
+ The assistant text (can be set to None for ordinary inference).
314
+
315
+ Returns:
316
+ ChatMessageListType: The chat messages for phrase grounding in the form of a list of dictionaries.
317
+ """
318
+ prompt: list[SingleChatMessageType] = [
319
+ {"index": None, "text": "Given the current frontal image", "type": "text"},
320
+ {"index": 0, "text": None, "type": "image"},
321
+ {
322
+ "index": None,
323
+ "text": f" Repeat the following finding as a grounded phrase with bounding boxes indicating all "
324
+ f"locations where it can be seen in the given chest X-ray image. Finding: {phrase}",
325
+ "type": "text",
326
+ },
327
+ ]
328
+ messages: ChatMessageListType = [{"content": prompt, "role": "user"}]
329
+ if assistant_text is not None:
330
+ messages.append({"content": [{"index": None, "text": assistant_text, "type": "text"}], "role": "assistant"})
331
+ return messages
332
+
333
+ def format_reporting_input(
334
+ self,
335
+ current_frontal: Image.Image,
336
+ current_lateral: Image.Image | None,
337
+ prior_frontal: Image.Image | None,
338
+ indication: str | None,
339
+ technique: str | None,
340
+ comparison: str | None,
341
+ prior_report: str | None,
342
+ get_grounding: bool = False,
343
+ assistant_text: str | None = None,
344
+ ) -> tuple[str, list[Image.Image]]:
345
+ """
346
+ This function formats the reporting prompt for the grounded and non-grounded reporting tasks from the given
347
+ input images and text sections. The images are normalized and stacked together in the right order.
348
+
349
+ Args:
350
+ current_frontal (Image.Image):
351
+ The current frontal image.
352
+ current_lateral (Image.Image | None):
353
+ The current lateral image.
354
+ prior_frontal (Image.Image | None):
355
+ The prior frontal image.
356
+ indication (str | None):
357
+ The indication section text.
358
+ technique (str | None):
359
+ The technique section text.
360
+ comparison (str | None):
361
+ The comparison section text.
362
+ prior_report (str | None):
363
+ The prior report section text.
364
+ get_grounding (bool):
365
+ A boolean indicating whether to construct the prompt for grounded or non-grounded reporting.
366
+ assistant_text (str | None): The assistant text (can be set to None for ordinary inference).
367
+
368
+ Returns:
369
+ tuple[str, list[Image.Image]]: The formatted prompt text and the normalized images stacked in the right order.
370
+ """
371
+ images = self._normalize_and_stack_images(
372
+ current_frontal=current_frontal,
373
+ current_lateral=current_lateral,
374
+ prior_frontal=prior_frontal,
375
+ )
376
+ messages = self._construct_chat_messages_reporting(
377
+ has_prior=prior_frontal is not None,
378
+ has_lateral=current_lateral is not None,
379
+ indication=indication,
380
+ technique=technique,
381
+ comparison=comparison,
382
+ prior_report=prior_report,
383
+ get_grounding=get_grounding,
384
+ assistant_text=assistant_text,
385
+ )
386
+ add_generation_prompt = assistant_text is None
387
+ text = self.tokenizer.apply_chat_template(messages, add_generation_prompt=add_generation_prompt, tokenize=False)
388
+ return text, images
389
+
390
+ def format_phrase_grounding_input(
391
+ self,
392
+ frontal_image: Image.Image,
393
+ phrase: str,
394
+ assistant_text: str | None = None,
395
+ ) -> tuple[str, list[Image.Image]]:
396
+ """
397
+ This function formats the phrase grounding prompt for the phrase grounding task from the given input
398
+ image and phrase.
399
+
400
+ Args:
401
+ frontal_image (Image.Image):
402
+ The frontal image.
403
+ phrase (str):
404
+ The phrase to be grounded.
405
+ assistant_text (str | None):
406
+ The assistant text (can be set to None for ordinary inference).
407
+
408
+ Returns:
409
+ tuple[str, list[Image.Image]]: The formatted phrase grounding prompt text and the normalized image.
410
+ """
411
+ images = self._normalize_and_stack_images(
412
+ current_frontal=frontal_image,
413
+ current_lateral=None,
414
+ prior_frontal=None,
415
+ )
416
+ messages = self._construct_chat_messages_phrase_grounding(phrase)
417
+ add_generation_prompt = assistant_text is None
418
+ text = self.tokenizer.apply_chat_template(messages, add_generation_prompt=add_generation_prompt, tokenize=False)
419
+ return text, images
420
+
421
+ def format_and_preprocess_reporting_input(
422
+ self,
423
+ current_frontal: Image.Image,
424
+ current_lateral: Image.Image | None,
425
+ prior_frontal: Image.Image | None,
426
+ indication: str | None,
427
+ technique: str | None,
428
+ comparison: str | None,
429
+ prior_report: str | None,
430
+ get_grounding: bool = False,
431
+ assistant_text: str | None = None,
432
+ **kwargs: Any,
433
+ ) -> BatchFeature:
434
+ """
435
+ This function formats and then preprocesses the input for the grounded and non-grounded reporting tasks from
436
+ the given input images and text sections and returns the batch feature for the model. It calls format_reporting_input
437
+ internally to format the input prompt and stack the images together in the right order.
438
+
439
+ Args:
440
+ current_frontal (Image.Image):
441
+ The current frontal image.
442
+ current_lateral (Image.Image | None):
443
+ The current lateral image.
444
+ prior_frontal (Image.Image | None):
445
+ The prior frontal image.
446
+ indication (str | None):
447
+ The indication section text.
448
+ technique (str | None):
449
+ The technique section text.
450
+ comparison (str | None):
451
+ The comparison section text.
452
+ prior_report (str | None):
453
+ The prior report section text.
454
+ get_grounding (bool):
455
+ A boolean indicating whether to preprocess the input for grounded or non-grounded reporting.
456
+ assistant_text (str | None):
457
+ The assistant text (can be set to None for ordinary inference).
458
+
459
+ Returns:
460
+ BatchFeature: The batch feature for the model, ready to be passed to the model.
461
+
462
+ """
463
+ text, images = self.format_reporting_input(
464
+ current_frontal=current_frontal,
465
+ current_lateral=current_lateral,
466
+ prior_frontal=prior_frontal,
467
+ indication=indication,
468
+ technique=technique,
469
+ comparison=comparison,
470
+ prior_report=prior_report,
471
+ get_grounding=get_grounding,
472
+ assistant_text=assistant_text,
473
+ )
474
+ return self(text=text, images=images, **kwargs)
475
+
476
+ def format_and_preprocess_phrase_grounding_input(
477
+ self,
478
+ frontal_image: Image.Image,
479
+ phrase: str,
480
+ assistant_text: str | None = None,
481
+ **kwargs: Any,
482
+ ) -> BatchFeature:
483
+ """
484
+ This function formats and then processes the input for the phrase grounding task from the given input image and
485
+ phrase and returns the batch feature for the model. It calls format_phrase_grounding_input internally to format
486
+ the input prompt and normalize the image.
487
+
488
+ Args:
489
+ frontal_image (Image.Image):
490
+ The frontal image.
491
+ phrase (str):
492
+ The phrase to be grounded.
493
+ assistant_text (str | None):
494
+ The assistant text (can be set to None for ordinary inference).
495
+
496
+ Returns:
497
+ BatchFeature: The batch feature for the model, ready to be passed to the model.
498
+ """
499
+ text, images = self.format_phrase_grounding_input(
500
+ frontal_image=frontal_image,
501
+ phrase=phrase,
502
+ assistant_text=assistant_text,
503
+ )
504
+ return self(text=text, images=images, **kwargs)
505
+
506
+ def _get_text_between_delimiters(self, text: str, begin_token: str, end_token: str) -> list[str]:
507
+ """
508
+ This function splits the input text into a list of substrings beased on the given begin and end tokens.
509
+
510
+ Args:
511
+ text (str):
512
+ The input text to be split.
513
+ begin_token (str):
514
+ The begin token.
515
+ end_token (str):
516
+ The end token.
517
+
518
+ Returns:
519
+ list[str]: The list of substrings between the given begin and end tokens.
520
+
521
+ Example:
522
+
523
+ ```python
524
+ >>> _get_text_between_delimiters("<obj>This is a grounded phrase</obj>. <obj>This is another grounded phrase</obj>.", "<obj>", "</obj>")
525
+ >>> # ["grounded phrase", "This is another grounded phrase"]
526
+
527
+ >>> _get_text_between_delimiters("<box><x10><y20><x30><y40></box><box><x50><y60><x70><y80></box>", "<box>", "</box>")
528
+ >>> # ["<x10><y20><x30><y40>", "<x50><y60><x70><y80>"]
529
+ ```
530
+ """
531
+ split_text = []
532
+ while begin_token in text:
533
+ assert text.startswith(begin_token)
534
+ end_index = text.find(end_token)
535
+ assert end_index != -1
536
+ split_text.append(text[len(begin_token) : end_index])
537
+ text = text[end_index + len(end_token) :]
538
+ assert len(text) == 0
539
+ return split_text
540
+
541
+ def convert_output_to_plaintext_or_grounded_sequence(
542
+ self, text: str
543
+ ) -> str | list[tuple[str, list[BoxType] | None]]:
544
+ """
545
+ This function converts the input text to a grounded sequence by extracting the grounded phrases and bounding
546
+ boxes from the text. If the text is plaintext without any grounded phrases, it returns the text as is.
547
+
548
+ Args:
549
+ text (str):
550
+ The input text to be converted.
551
+
552
+ Returns:
553
+ str | list[tuple[str, list[BoxType] | None]]: The grounded sequence.
554
+
555
+ Example:
556
+
557
+ ```python
558
+ >>> convert_output_to_plaintext_or_grounded_sequence("<obj>grounded phrase <box><x55><y45><x70><y56></box></obj><obj>ungrounded phrase</obj>")
559
+ >>> # [
560
+ >>> # ("grounded phrase", [(0.55, 0.45, 0.70, 0.56)]),
561
+ >>> # ("ungrounded phrase", None),
562
+ >>> # ]
563
+
564
+ >>> convert_output_to_plaintext_or_grounded_sequence("plain text")
565
+ >>> # "plain text"
566
+ ```
567
+ """
568
+ text = text.strip()
569
+
570
+ # Plain text
571
+ if not any(
572
+ [
573
+ self.phrase_start_token in text,
574
+ self.phrase_end_token in text,
575
+ self.box_start_token in text,
576
+ self.box_end_token in text,
577
+ ]
578
+ ):
579
+ return text
580
+
581
+ # One or more grounded phrases
582
+ grounded_phrase_texts = self._get_text_between_delimiters(text, self.phrase_start_token, self.phrase_end_token)
583
+ grounded_phrases: list[tuple[str, list[BoxType] | None]] = []
584
+ for grounded_phrase_text in grounded_phrase_texts:
585
+ if self.box_start_token in grounded_phrase_text or self.box_end_token in grounded_phrase_text:
586
+ first_box_start_index = grounded_phrase_text.find(self.box_start_token)
587
+ phrase_text = grounded_phrase_text[:first_box_start_index].strip()
588
+ boxes_text = grounded_phrase_text[first_box_start_index:]
589
+ boxes_text_list = self._get_text_between_delimiters(
590
+ boxes_text, self.box_start_token, self.box_end_token
591
+ )
592
+ boxes: list[BoxType] = []
593
+ for box_text in boxes_text_list:
594
+ # extract from <x_><y_><x_><y_>
595
+ regex = r"<x(\d+?)><y(\d+?)><x(\d+?)><y(\d+?)>"
596
+ match = re.search(regex, box_text)
597
+ if match:
598
+ x_min, y_min, x_max, y_max = match.groups()
599
+ box: BoxType = tuple( # type: ignore[assignment]
600
+ (int(coord) + 0.5) / self.num_box_coord_bins for coord in (x_min, y_min, x_max, y_max)
601
+ )
602
+ assert all(0 <= coord <= 1 for coord in box), f"Invalid box coordinates: {box}"
603
+ boxes.append(box)
604
+ else:
605
+ raise ValueError(f"Invalid box coordinates: {box_text} not matching regex {regex}")
606
+ grounded_phrases.append((phrase_text, boxes))
607
+ else:
608
+ grounded_phrases.append((grounded_phrase_text.lstrip(), None))
609
+ return grounded_phrases
610
+
611
+ @staticmethod
612
+ def adjust_box_for_original_image_size(box: BoxType, width: int, height: int) -> BoxType:
613
+ """
614
+ This function adjusts the bounding boxes from the MAIRA-2 model output to account for the image processor
615
+ cropping the image to be square prior to the model forward pass. The box coordinates are adjusted to be
616
+ relative to the original shape of the image assuming the image processor cropped the image based on the length
617
+ of the shortest side.
618
+
619
+ Args:
620
+ box (BoxType):
621
+ The box to be adjusted, normalised to (0, 1).
622
+ width (int):
623
+ Original width of the image, in pixels.
624
+ height (int):
625
+ Original height of the image, in pixels.
626
+
627
+ Returns:
628
+ BoxType: The box normalised relative to the original size of the image.
629
+ """
630
+ crop_width = crop_height = min(width, height)
631
+ x_offset = (width - crop_width) // 2
632
+ y_offset = (height - crop_height) // 2
633
+
634
+ norm_x_min, norm_y_min, norm_x_max, norm_y_max = box
635
+
636
+ abs_x_min = int(norm_x_min * crop_width + x_offset)
637
+ abs_x_max = int(norm_x_max * crop_width + x_offset)
638
+ abs_y_min = int(norm_y_min * crop_height + y_offset)
639
+ abs_y_max = int(norm_y_max * crop_height + y_offset)
640
+
641
+ adjusted_norm_x_min = abs_x_min / width
642
+ adjusted_norm_x_max = abs_x_max / width
643
+ adjusted_norm_y_min = abs_y_min / height
644
+ adjusted_norm_y_max = abs_y_max / height
645
+
646
+ return (adjusted_norm_x_min, adjusted_norm_y_min, adjusted_norm_x_max, adjusted_norm_y_max)
processor_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoProcessor": "processing_maira2.Maira2Processor"
4
+ },
5
+ "box_end_token": "</box>",
6
+ "box_start_token": "<box>",
7
+ "image_token": "<image>",
8
+ "num_box_coord_bins": 100,
9
+ "patch_size": 14,
10
+ "phrase_end_token": "</obj>",
11
+ "phrase_start_token": "<obj>",
12
+ "processor_class": "Maira2Processor",
13
+ "vision_feature_select_strategy": "default"
14
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e556afd44213b6bd1be2b850ebbbd98f5481437a8021afaf58ee7fb1818d347
3
+ size 499723
tokenizer_config.json ADDED
@@ -0,0 +1,1701 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": true,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ },
30
+ "32000": {
31
+ "content": "<obj>",
32
+ "lstrip": false,
33
+ "normalized": true,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": false
37
+ },
38
+ "32001": {
39
+ "content": "</obj>",
40
+ "lstrip": false,
41
+ "normalized": true,
42
+ "rstrip": false,
43
+ "single_word": false,
44
+ "special": false
45
+ },
46
+ "32002": {
47
+ "content": "<x0>",
48
+ "lstrip": false,
49
+ "normalized": true,
50
+ "rstrip": false,
51
+ "single_word": false,
52
+ "special": false
53
+ },
54
+ "32003": {
55
+ "content": "<x1>",
56
+ "lstrip": false,
57
+ "normalized": true,
58
+ "rstrip": false,
59
+ "single_word": false,
60
+ "special": false
61
+ },
62
+ "32004": {
63
+ "content": "<x2>",
64
+ "lstrip": false,
65
+ "normalized": true,
66
+ "rstrip": false,
67
+ "single_word": false,
68
+ "special": false
69
+ },
70
+ "32005": {
71
+ "content": "<x3>",
72
+ "lstrip": false,
73
+ "normalized": true,
74
+ "rstrip": false,
75
+ "single_word": false,
76
+ "special": false
77
+ },
78
+ "32006": {
79
+ "content": "<x4>",
80
+ "lstrip": false,
81
+ "normalized": true,
82
+ "rstrip": false,
83
+ "single_word": false,
84
+ "special": false
85
+ },
86
+ "32007": {
87
+ "content": "<x5>",
88
+ "lstrip": false,
89
+ "normalized": true,
90
+ "rstrip": false,
91
+ "single_word": false,
92
+ "special": false
93
+ },
94
+ "32008": {
95
+ "content": "<x6>",
96
+ "lstrip": false,
97
+ "normalized": true,
98
+ "rstrip": false,
99
+ "single_word": false,
100
+ "special": false
101
+ },
102
+ "32009": {
103
+ "content": "<x7>",
104
+ "lstrip": false,
105
+ "normalized": true,
106
+ "rstrip": false,
107
+ "single_word": false,
108
+ "special": false
109
+ },
110
+ "32010": {
111
+ "content": "<x8>",
112
+ "lstrip": false,
113
+ "normalized": true,
114
+ "rstrip": false,
115
+ "single_word": false,
116
+ "special": false
117
+ },
118
+ "32011": {
119
+ "content": "<x9>",
120
+ "lstrip": false,
121
+ "normalized": true,
122
+ "rstrip": false,
123
+ "single_word": false,
124
+ "special": false
125
+ },
126
+ "32012": {
127
+ "content": "<x10>",
128
+ "lstrip": false,
129
+ "normalized": true,
130
+ "rstrip": false,
131
+ "single_word": false,
132
+ "special": false
133
+ },
134
+ "32013": {
135
+ "content": "<x11>",
136
+ "lstrip": false,
137
+ "normalized": true,
138
+ "rstrip": false,
139
+ "single_word": false,
140
+ "special": false
141
+ },
142
+ "32014": {
143
+ "content": "<x12>",
144
+ "lstrip": false,
145
+ "normalized": true,
146
+ "rstrip": false,
147
+ "single_word": false,
148
+ "special": false
149
+ },
150
+ "32015": {
151
+ "content": "<x13>",
152
+ "lstrip": false,
153
+ "normalized": true,
154
+ "rstrip": false,
155
+ "single_word": false,
156
+ "special": false
157
+ },
158
+ "32016": {
159
+ "content": "<x14>",
160
+ "lstrip": false,
161
+ "normalized": true,
162
+ "rstrip": false,
163
+ "single_word": false,
164
+ "special": false
165
+ },
166
+ "32017": {
167
+ "content": "<x15>",
168
+ "lstrip": false,
169
+ "normalized": true,
170
+ "rstrip": false,
171
+ "single_word": false,
172
+ "special": false
173
+ },
174
+ "32018": {
175
+ "content": "<x16>",
176
+ "lstrip": false,
177
+ "normalized": true,
178
+ "rstrip": false,
179
+ "single_word": false,
180
+ "special": false
181
+ },
182
+ "32019": {
183
+ "content": "<x17>",
184
+ "lstrip": false,
185
+ "normalized": true,
186
+ "rstrip": false,
187
+ "single_word": false,
188
+ "special": false
189
+ },
190
+ "32020": {
191
+ "content": "<x18>",
192
+ "lstrip": false,
193
+ "normalized": true,
194
+ "rstrip": false,
195
+ "single_word": false,
196
+ "special": false
197
+ },
198
+ "32021": {
199
+ "content": "<x19>",
200
+ "lstrip": false,
201
+ "normalized": true,
202
+ "rstrip": false,
203
+ "single_word": false,
204
+ "special": false
205
+ },
206
+ "32022": {
207
+ "content": "<x20>",
208
+ "lstrip": false,
209
+ "normalized": true,
210
+ "rstrip": false,
211
+ "single_word": false,
212
+ "special": false
213
+ },
214
+ "32023": {
215
+ "content": "<x21>",
216
+ "lstrip": false,
217
+ "normalized": true,
218
+ "rstrip": false,
219
+ "single_word": false,
220
+ "special": false
221
+ },
222
+ "32024": {
223
+ "content": "<x22>",
224
+ "lstrip": false,
225
+ "normalized": true,
226
+ "rstrip": false,
227
+ "single_word": false,
228
+ "special": false
229
+ },
230
+ "32025": {
231
+ "content": "<x23>",
232
+ "lstrip": false,
233
+ "normalized": true,
234
+ "rstrip": false,
235
+ "single_word": false,
236
+ "special": false
237
+ },
238
+ "32026": {
239
+ "content": "<x24>",
240
+ "lstrip": false,
241
+ "normalized": true,
242
+ "rstrip": false,
243
+ "single_word": false,
244
+ "special": false
245
+ },
246
+ "32027": {
247
+ "content": "<x25>",
248
+ "lstrip": false,
249
+ "normalized": true,
250
+ "rstrip": false,
251
+ "single_word": false,
252
+ "special": false
253
+ },
254
+ "32028": {
255
+ "content": "<x26>",
256
+ "lstrip": false,
257
+ "normalized": true,
258
+ "rstrip": false,
259
+ "single_word": false,
260
+ "special": false
261
+ },
262
+ "32029": {
263
+ "content": "<x27>",
264
+ "lstrip": false,
265
+ "normalized": true,
266
+ "rstrip": false,
267
+ "single_word": false,
268
+ "special": false
269
+ },
270
+ "32030": {
271
+ "content": "<x28>",
272
+ "lstrip": false,
273
+ "normalized": true,
274
+ "rstrip": false,
275
+ "single_word": false,
276
+ "special": false
277
+ },
278
+ "32031": {
279
+ "content": "<x29>",
280
+ "lstrip": false,
281
+ "normalized": true,
282
+ "rstrip": false,
283
+ "single_word": false,
284
+ "special": false
285
+ },
286
+ "32032": {
287
+ "content": "<x30>",
288
+ "lstrip": false,
289
+ "normalized": true,
290
+ "rstrip": false,
291
+ "single_word": false,
292
+ "special": false
293
+ },
294
+ "32033": {
295
+ "content": "<x31>",
296
+ "lstrip": false,
297
+ "normalized": true,
298
+ "rstrip": false,
299
+ "single_word": false,
300
+ "special": false
301
+ },
302
+ "32034": {
303
+ "content": "<x32>",
304
+ "lstrip": false,
305
+ "normalized": true,
306
+ "rstrip": false,
307
+ "single_word": false,
308
+ "special": false
309
+ },
310
+ "32035": {
311
+ "content": "<x33>",
312
+ "lstrip": false,
313
+ "normalized": true,
314
+ "rstrip": false,
315
+ "single_word": false,
316
+ "special": false
317
+ },
318
+ "32036": {
319
+ "content": "<x34>",
320
+ "lstrip": false,
321
+ "normalized": true,
322
+ "rstrip": false,
323
+ "single_word": false,
324
+ "special": false
325
+ },
326
+ "32037": {
327
+ "content": "<x35>",
328
+ "lstrip": false,
329
+ "normalized": true,
330
+ "rstrip": false,
331
+ "single_word": false,
332
+ "special": false
333
+ },
334
+ "32038": {
335
+ "content": "<x36>",
336
+ "lstrip": false,
337
+ "normalized": true,
338
+ "rstrip": false,
339
+ "single_word": false,
340
+ "special": false
341
+ },
342
+ "32039": {
343
+ "content": "<x37>",
344
+ "lstrip": false,
345
+ "normalized": true,
346
+ "rstrip": false,
347
+ "single_word": false,
348
+ "special": false
349
+ },
350
+ "32040": {
351
+ "content": "<x38>",
352
+ "lstrip": false,
353
+ "normalized": true,
354
+ "rstrip": false,
355
+ "single_word": false,
356
+ "special": false
357
+ },
358
+ "32041": {
359
+ "content": "<x39>",
360
+ "lstrip": false,
361
+ "normalized": true,
362
+ "rstrip": false,
363
+ "single_word": false,
364
+ "special": false
365
+ },
366
+ "32042": {
367
+ "content": "<x40>",
368
+ "lstrip": false,
369
+ "normalized": true,
370
+ "rstrip": false,
371
+ "single_word": false,
372
+ "special": false
373
+ },
374
+ "32043": {
375
+ "content": "<x41>",
376
+ "lstrip": false,
377
+ "normalized": true,
378
+ "rstrip": false,
379
+ "single_word": false,
380
+ "special": false
381
+ },
382
+ "32044": {
383
+ "content": "<x42>",
384
+ "lstrip": false,
385
+ "normalized": true,
386
+ "rstrip": false,
387
+ "single_word": false,
388
+ "special": false
389
+ },
390
+ "32045": {
391
+ "content": "<x43>",
392
+ "lstrip": false,
393
+ "normalized": true,
394
+ "rstrip": false,
395
+ "single_word": false,
396
+ "special": false
397
+ },
398
+ "32046": {
399
+ "content": "<x44>",
400
+ "lstrip": false,
401
+ "normalized": true,
402
+ "rstrip": false,
403
+ "single_word": false,
404
+ "special": false
405
+ },
406
+ "32047": {
407
+ "content": "<x45>",
408
+ "lstrip": false,
409
+ "normalized": true,
410
+ "rstrip": false,
411
+ "single_word": false,
412
+ "special": false
413
+ },
414
+ "32048": {
415
+ "content": "<x46>",
416
+ "lstrip": false,
417
+ "normalized": true,
418
+ "rstrip": false,
419
+ "single_word": false,
420
+ "special": false
421
+ },
422
+ "32049": {
423
+ "content": "<x47>",
424
+ "lstrip": false,
425
+ "normalized": true,
426
+ "rstrip": false,
427
+ "single_word": false,
428
+ "special": false
429
+ },
430
+ "32050": {
431
+ "content": "<x48>",
432
+ "lstrip": false,
433
+ "normalized": true,
434
+ "rstrip": false,
435
+ "single_word": false,
436
+ "special": false
437
+ },
438
+ "32051": {
439
+ "content": "<x49>",
440
+ "lstrip": false,
441
+ "normalized": true,
442
+ "rstrip": false,
443
+ "single_word": false,
444
+ "special": false
445
+ },
446
+ "32052": {
447
+ "content": "<x50>",
448
+ "lstrip": false,
449
+ "normalized": true,
450
+ "rstrip": false,
451
+ "single_word": false,
452
+ "special": false
453
+ },
454
+ "32053": {
455
+ "content": "<x51>",
456
+ "lstrip": false,
457
+ "normalized": true,
458
+ "rstrip": false,
459
+ "single_word": false,
460
+ "special": false
461
+ },
462
+ "32054": {
463
+ "content": "<x52>",
464
+ "lstrip": false,
465
+ "normalized": true,
466
+ "rstrip": false,
467
+ "single_word": false,
468
+ "special": false
469
+ },
470
+ "32055": {
471
+ "content": "<x53>",
472
+ "lstrip": false,
473
+ "normalized": true,
474
+ "rstrip": false,
475
+ "single_word": false,
476
+ "special": false
477
+ },
478
+ "32056": {
479
+ "content": "<x54>",
480
+ "lstrip": false,
481
+ "normalized": true,
482
+ "rstrip": false,
483
+ "single_word": false,
484
+ "special": false
485
+ },
486
+ "32057": {
487
+ "content": "<x55>",
488
+ "lstrip": false,
489
+ "normalized": true,
490
+ "rstrip": false,
491
+ "single_word": false,
492
+ "special": false
493
+ },
494
+ "32058": {
495
+ "content": "<x56>",
496
+ "lstrip": false,
497
+ "normalized": true,
498
+ "rstrip": false,
499
+ "single_word": false,
500
+ "special": false
501
+ },
502
+ "32059": {
503
+ "content": "<x57>",
504
+ "lstrip": false,
505
+ "normalized": true,
506
+ "rstrip": false,
507
+ "single_word": false,
508
+ "special": false
509
+ },
510
+ "32060": {
511
+ "content": "<x58>",
512
+ "lstrip": false,
513
+ "normalized": true,
514
+ "rstrip": false,
515
+ "single_word": false,
516
+ "special": false
517
+ },
518
+ "32061": {
519
+ "content": "<x59>",
520
+ "lstrip": false,
521
+ "normalized": true,
522
+ "rstrip": false,
523
+ "single_word": false,
524
+ "special": false
525
+ },
526
+ "32062": {
527
+ "content": "<x60>",
528
+ "lstrip": false,
529
+ "normalized": true,
530
+ "rstrip": false,
531
+ "single_word": false,
532
+ "special": false
533
+ },
534
+ "32063": {
535
+ "content": "<x61>",
536
+ "lstrip": false,
537
+ "normalized": true,
538
+ "rstrip": false,
539
+ "single_word": false,
540
+ "special": false
541
+ },
542
+ "32064": {
543
+ "content": "<x62>",
544
+ "lstrip": false,
545
+ "normalized": true,
546
+ "rstrip": false,
547
+ "single_word": false,
548
+ "special": false
549
+ },
550
+ "32065": {
551
+ "content": "<x63>",
552
+ "lstrip": false,
553
+ "normalized": true,
554
+ "rstrip": false,
555
+ "single_word": false,
556
+ "special": false
557
+ },
558
+ "32066": {
559
+ "content": "<x64>",
560
+ "lstrip": false,
561
+ "normalized": true,
562
+ "rstrip": false,
563
+ "single_word": false,
564
+ "special": false
565
+ },
566
+ "32067": {
567
+ "content": "<x65>",
568
+ "lstrip": false,
569
+ "normalized": true,
570
+ "rstrip": false,
571
+ "single_word": false,
572
+ "special": false
573
+ },
574
+ "32068": {
575
+ "content": "<x66>",
576
+ "lstrip": false,
577
+ "normalized": true,
578
+ "rstrip": false,
579
+ "single_word": false,
580
+ "special": false
581
+ },
582
+ "32069": {
583
+ "content": "<x67>",
584
+ "lstrip": false,
585
+ "normalized": true,
586
+ "rstrip": false,
587
+ "single_word": false,
588
+ "special": false
589
+ },
590
+ "32070": {
591
+ "content": "<x68>",
592
+ "lstrip": false,
593
+ "normalized": true,
594
+ "rstrip": false,
595
+ "single_word": false,
596
+ "special": false
597
+ },
598
+ "32071": {
599
+ "content": "<x69>",
600
+ "lstrip": false,
601
+ "normalized": true,
602
+ "rstrip": false,
603
+ "single_word": false,
604
+ "special": false
605
+ },
606
+ "32072": {
607
+ "content": "<x70>",
608
+ "lstrip": false,
609
+ "normalized": true,
610
+ "rstrip": false,
611
+ "single_word": false,
612
+ "special": false
613
+ },
614
+ "32073": {
615
+ "content": "<x71>",
616
+ "lstrip": false,
617
+ "normalized": true,
618
+ "rstrip": false,
619
+ "single_word": false,
620
+ "special": false
621
+ },
622
+ "32074": {
623
+ "content": "<x72>",
624
+ "lstrip": false,
625
+ "normalized": true,
626
+ "rstrip": false,
627
+ "single_word": false,
628
+ "special": false
629
+ },
630
+ "32075": {
631
+ "content": "<x73>",
632
+ "lstrip": false,
633
+ "normalized": true,
634
+ "rstrip": false,
635
+ "single_word": false,
636
+ "special": false
637
+ },
638
+ "32076": {
639
+ "content": "<x74>",
640
+ "lstrip": false,
641
+ "normalized": true,
642
+ "rstrip": false,
643
+ "single_word": false,
644
+ "special": false
645
+ },
646
+ "32077": {
647
+ "content": "<x75>",
648
+ "lstrip": false,
649
+ "normalized": true,
650
+ "rstrip": false,
651
+ "single_word": false,
652
+ "special": false
653
+ },
654
+ "32078": {
655
+ "content": "<x76>",
656
+ "lstrip": false,
657
+ "normalized": true,
658
+ "rstrip": false,
659
+ "single_word": false,
660
+ "special": false
661
+ },
662
+ "32079": {
663
+ "content": "<x77>",
664
+ "lstrip": false,
665
+ "normalized": true,
666
+ "rstrip": false,
667
+ "single_word": false,
668
+ "special": false
669
+ },
670
+ "32080": {
671
+ "content": "<x78>",
672
+ "lstrip": false,
673
+ "normalized": true,
674
+ "rstrip": false,
675
+ "single_word": false,
676
+ "special": false
677
+ },
678
+ "32081": {
679
+ "content": "<x79>",
680
+ "lstrip": false,
681
+ "normalized": true,
682
+ "rstrip": false,
683
+ "single_word": false,
684
+ "special": false
685
+ },
686
+ "32082": {
687
+ "content": "<x80>",
688
+ "lstrip": false,
689
+ "normalized": true,
690
+ "rstrip": false,
691
+ "single_word": false,
692
+ "special": false
693
+ },
694
+ "32083": {
695
+ "content": "<x81>",
696
+ "lstrip": false,
697
+ "normalized": true,
698
+ "rstrip": false,
699
+ "single_word": false,
700
+ "special": false
701
+ },
702
+ "32084": {
703
+ "content": "<x82>",
704
+ "lstrip": false,
705
+ "normalized": true,
706
+ "rstrip": false,
707
+ "single_word": false,
708
+ "special": false
709
+ },
710
+ "32085": {
711
+ "content": "<x83>",
712
+ "lstrip": false,
713
+ "normalized": true,
714
+ "rstrip": false,
715
+ "single_word": false,
716
+ "special": false
717
+ },
718
+ "32086": {
719
+ "content": "<x84>",
720
+ "lstrip": false,
721
+ "normalized": true,
722
+ "rstrip": false,
723
+ "single_word": false,
724
+ "special": false
725
+ },
726
+ "32087": {
727
+ "content": "<x85>",
728
+ "lstrip": false,
729
+ "normalized": true,
730
+ "rstrip": false,
731
+ "single_word": false,
732
+ "special": false
733
+ },
734
+ "32088": {
735
+ "content": "<x86>",
736
+ "lstrip": false,
737
+ "normalized": true,
738
+ "rstrip": false,
739
+ "single_word": false,
740
+ "special": false
741
+ },
742
+ "32089": {
743
+ "content": "<x87>",
744
+ "lstrip": false,
745
+ "normalized": true,
746
+ "rstrip": false,
747
+ "single_word": false,
748
+ "special": false
749
+ },
750
+ "32090": {
751
+ "content": "<x88>",
752
+ "lstrip": false,
753
+ "normalized": true,
754
+ "rstrip": false,
755
+ "single_word": false,
756
+ "special": false
757
+ },
758
+ "32091": {
759
+ "content": "<x89>",
760
+ "lstrip": false,
761
+ "normalized": true,
762
+ "rstrip": false,
763
+ "single_word": false,
764
+ "special": false
765
+ },
766
+ "32092": {
767
+ "content": "<x90>",
768
+ "lstrip": false,
769
+ "normalized": true,
770
+ "rstrip": false,
771
+ "single_word": false,
772
+ "special": false
773
+ },
774
+ "32093": {
775
+ "content": "<x91>",
776
+ "lstrip": false,
777
+ "normalized": true,
778
+ "rstrip": false,
779
+ "single_word": false,
780
+ "special": false
781
+ },
782
+ "32094": {
783
+ "content": "<x92>",
784
+ "lstrip": false,
785
+ "normalized": true,
786
+ "rstrip": false,
787
+ "single_word": false,
788
+ "special": false
789
+ },
790
+ "32095": {
791
+ "content": "<x93>",
792
+ "lstrip": false,
793
+ "normalized": true,
794
+ "rstrip": false,
795
+ "single_word": false,
796
+ "special": false
797
+ },
798
+ "32096": {
799
+ "content": "<x94>",
800
+ "lstrip": false,
801
+ "normalized": true,
802
+ "rstrip": false,
803
+ "single_word": false,
804
+ "special": false
805
+ },
806
+ "32097": {
807
+ "content": "<x95>",
808
+ "lstrip": false,
809
+ "normalized": true,
810
+ "rstrip": false,
811
+ "single_word": false,
812
+ "special": false
813
+ },
814
+ "32098": {
815
+ "content": "<x96>",
816
+ "lstrip": false,
817
+ "normalized": true,
818
+ "rstrip": false,
819
+ "single_word": false,
820
+ "special": false
821
+ },
822
+ "32099": {
823
+ "content": "<x97>",
824
+ "lstrip": false,
825
+ "normalized": true,
826
+ "rstrip": false,
827
+ "single_word": false,
828
+ "special": false
829
+ },
830
+ "32100": {
831
+ "content": "<x98>",
832
+ "lstrip": false,
833
+ "normalized": true,
834
+ "rstrip": false,
835
+ "single_word": false,
836
+ "special": false
837
+ },
838
+ "32101": {
839
+ "content": "<x99>",
840
+ "lstrip": false,
841
+ "normalized": true,
842
+ "rstrip": false,
843
+ "single_word": false,
844
+ "special": false
845
+ },
846
+ "32102": {
847
+ "content": "<y0>",
848
+ "lstrip": false,
849
+ "normalized": true,
850
+ "rstrip": false,
851
+ "single_word": false,
852
+ "special": false
853
+ },
854
+ "32103": {
855
+ "content": "<y1>",
856
+ "lstrip": false,
857
+ "normalized": true,
858
+ "rstrip": false,
859
+ "single_word": false,
860
+ "special": false
861
+ },
862
+ "32104": {
863
+ "content": "<y2>",
864
+ "lstrip": false,
865
+ "normalized": true,
866
+ "rstrip": false,
867
+ "single_word": false,
868
+ "special": false
869
+ },
870
+ "32105": {
871
+ "content": "<y3>",
872
+ "lstrip": false,
873
+ "normalized": true,
874
+ "rstrip": false,
875
+ "single_word": false,
876
+ "special": false
877
+ },
878
+ "32106": {
879
+ "content": "<y4>",
880
+ "lstrip": false,
881
+ "normalized": true,
882
+ "rstrip": false,
883
+ "single_word": false,
884
+ "special": false
885
+ },
886
+ "32107": {
887
+ "content": "<y5>",
888
+ "lstrip": false,
889
+ "normalized": true,
890
+ "rstrip": false,
891
+ "single_word": false,
892
+ "special": false
893
+ },
894
+ "32108": {
895
+ "content": "<y6>",
896
+ "lstrip": false,
897
+ "normalized": true,
898
+ "rstrip": false,
899
+ "single_word": false,
900
+ "special": false
901
+ },
902
+ "32109": {
903
+ "content": "<y7>",
904
+ "lstrip": false,
905
+ "normalized": true,
906
+ "rstrip": false,
907
+ "single_word": false,
908
+ "special": false
909
+ },
910
+ "32110": {
911
+ "content": "<y8>",
912
+ "lstrip": false,
913
+ "normalized": true,
914
+ "rstrip": false,
915
+ "single_word": false,
916
+ "special": false
917
+ },
918
+ "32111": {
919
+ "content": "<y9>",
920
+ "lstrip": false,
921
+ "normalized": true,
922
+ "rstrip": false,
923
+ "single_word": false,
924
+ "special": false
925
+ },
926
+ "32112": {
927
+ "content": "<y10>",
928
+ "lstrip": false,
929
+ "normalized": true,
930
+ "rstrip": false,
931
+ "single_word": false,
932
+ "special": false
933
+ },
934
+ "32113": {
935
+ "content": "<y11>",
936
+ "lstrip": false,
937
+ "normalized": true,
938
+ "rstrip": false,
939
+ "single_word": false,
940
+ "special": false
941
+ },
942
+ "32114": {
943
+ "content": "<y12>",
944
+ "lstrip": false,
945
+ "normalized": true,
946
+ "rstrip": false,
947
+ "single_word": false,
948
+ "special": false
949
+ },
950
+ "32115": {
951
+ "content": "<y13>",
952
+ "lstrip": false,
953
+ "normalized": true,
954
+ "rstrip": false,
955
+ "single_word": false,
956
+ "special": false
957
+ },
958
+ "32116": {
959
+ "content": "<y14>",
960
+ "lstrip": false,
961
+ "normalized": true,
962
+ "rstrip": false,
963
+ "single_word": false,
964
+ "special": false
965
+ },
966
+ "32117": {
967
+ "content": "<y15>",
968
+ "lstrip": false,
969
+ "normalized": true,
970
+ "rstrip": false,
971
+ "single_word": false,
972
+ "special": false
973
+ },
974
+ "32118": {
975
+ "content": "<y16>",
976
+ "lstrip": false,
977
+ "normalized": true,
978
+ "rstrip": false,
979
+ "single_word": false,
980
+ "special": false
981
+ },
982
+ "32119": {
983
+ "content": "<y17>",
984
+ "lstrip": false,
985
+ "normalized": true,
986
+ "rstrip": false,
987
+ "single_word": false,
988
+ "special": false
989
+ },
990
+ "32120": {
991
+ "content": "<y18>",
992
+ "lstrip": false,
993
+ "normalized": true,
994
+ "rstrip": false,
995
+ "single_word": false,
996
+ "special": false
997
+ },
998
+ "32121": {
999
+ "content": "<y19>",
1000
+ "lstrip": false,
1001
+ "normalized": true,
1002
+ "rstrip": false,
1003
+ "single_word": false,
1004
+ "special": false
1005
+ },
1006
+ "32122": {
1007
+ "content": "<y20>",
1008
+ "lstrip": false,
1009
+ "normalized": true,
1010
+ "rstrip": false,
1011
+ "single_word": false,
1012
+ "special": false
1013
+ },
1014
+ "32123": {
1015
+ "content": "<y21>",
1016
+ "lstrip": false,
1017
+ "normalized": true,
1018
+ "rstrip": false,
1019
+ "single_word": false,
1020
+ "special": false
1021
+ },
1022
+ "32124": {
1023
+ "content": "<y22>",
1024
+ "lstrip": false,
1025
+ "normalized": true,
1026
+ "rstrip": false,
1027
+ "single_word": false,
1028
+ "special": false
1029
+ },
1030
+ "32125": {
1031
+ "content": "<y23>",
1032
+ "lstrip": false,
1033
+ "normalized": true,
1034
+ "rstrip": false,
1035
+ "single_word": false,
1036
+ "special": false
1037
+ },
1038
+ "32126": {
1039
+ "content": "<y24>",
1040
+ "lstrip": false,
1041
+ "normalized": true,
1042
+ "rstrip": false,
1043
+ "single_word": false,
1044
+ "special": false
1045
+ },
1046
+ "32127": {
1047
+ "content": "<y25>",
1048
+ "lstrip": false,
1049
+ "normalized": true,
1050
+ "rstrip": false,
1051
+ "single_word": false,
1052
+ "special": false
1053
+ },
1054
+ "32128": {
1055
+ "content": "<y26>",
1056
+ "lstrip": false,
1057
+ "normalized": true,
1058
+ "rstrip": false,
1059
+ "single_word": false,
1060
+ "special": false
1061
+ },
1062
+ "32129": {
1063
+ "content": "<y27>",
1064
+ "lstrip": false,
1065
+ "normalized": true,
1066
+ "rstrip": false,
1067
+ "single_word": false,
1068
+ "special": false
1069
+ },
1070
+ "32130": {
1071
+ "content": "<y28>",
1072
+ "lstrip": false,
1073
+ "normalized": true,
1074
+ "rstrip": false,
1075
+ "single_word": false,
1076
+ "special": false
1077
+ },
1078
+ "32131": {
1079
+ "content": "<y29>",
1080
+ "lstrip": false,
1081
+ "normalized": true,
1082
+ "rstrip": false,
1083
+ "single_word": false,
1084
+ "special": false
1085
+ },
1086
+ "32132": {
1087
+ "content": "<y30>",
1088
+ "lstrip": false,
1089
+ "normalized": true,
1090
+ "rstrip": false,
1091
+ "single_word": false,
1092
+ "special": false
1093
+ },
1094
+ "32133": {
1095
+ "content": "<y31>",
1096
+ "lstrip": false,
1097
+ "normalized": true,
1098
+ "rstrip": false,
1099
+ "single_word": false,
1100
+ "special": false
1101
+ },
1102
+ "32134": {
1103
+ "content": "<y32>",
1104
+ "lstrip": false,
1105
+ "normalized": true,
1106
+ "rstrip": false,
1107
+ "single_word": false,
1108
+ "special": false
1109
+ },
1110
+ "32135": {
1111
+ "content": "<y33>",
1112
+ "lstrip": false,
1113
+ "normalized": true,
1114
+ "rstrip": false,
1115
+ "single_word": false,
1116
+ "special": false
1117
+ },
1118
+ "32136": {
1119
+ "content": "<y34>",
1120
+ "lstrip": false,
1121
+ "normalized": true,
1122
+ "rstrip": false,
1123
+ "single_word": false,
1124
+ "special": false
1125
+ },
1126
+ "32137": {
1127
+ "content": "<y35>",
1128
+ "lstrip": false,
1129
+ "normalized": true,
1130
+ "rstrip": false,
1131
+ "single_word": false,
1132
+ "special": false
1133
+ },
1134
+ "32138": {
1135
+ "content": "<y36>",
1136
+ "lstrip": false,
1137
+ "normalized": true,
1138
+ "rstrip": false,
1139
+ "single_word": false,
1140
+ "special": false
1141
+ },
1142
+ "32139": {
1143
+ "content": "<y37>",
1144
+ "lstrip": false,
1145
+ "normalized": true,
1146
+ "rstrip": false,
1147
+ "single_word": false,
1148
+ "special": false
1149
+ },
1150
+ "32140": {
1151
+ "content": "<y38>",
1152
+ "lstrip": false,
1153
+ "normalized": true,
1154
+ "rstrip": false,
1155
+ "single_word": false,
1156
+ "special": false
1157
+ },
1158
+ "32141": {
1159
+ "content": "<y39>",
1160
+ "lstrip": false,
1161
+ "normalized": true,
1162
+ "rstrip": false,
1163
+ "single_word": false,
1164
+ "special": false
1165
+ },
1166
+ "32142": {
1167
+ "content": "<y40>",
1168
+ "lstrip": false,
1169
+ "normalized": true,
1170
+ "rstrip": false,
1171
+ "single_word": false,
1172
+ "special": false
1173
+ },
1174
+ "32143": {
1175
+ "content": "<y41>",
1176
+ "lstrip": false,
1177
+ "normalized": true,
1178
+ "rstrip": false,
1179
+ "single_word": false,
1180
+ "special": false
1181
+ },
1182
+ "32144": {
1183
+ "content": "<y42>",
1184
+ "lstrip": false,
1185
+ "normalized": true,
1186
+ "rstrip": false,
1187
+ "single_word": false,
1188
+ "special": false
1189
+ },
1190
+ "32145": {
1191
+ "content": "<y43>",
1192
+ "lstrip": false,
1193
+ "normalized": true,
1194
+ "rstrip": false,
1195
+ "single_word": false,
1196
+ "special": false
1197
+ },
1198
+ "32146": {
1199
+ "content": "<y44>",
1200
+ "lstrip": false,
1201
+ "normalized": true,
1202
+ "rstrip": false,
1203
+ "single_word": false,
1204
+ "special": false
1205
+ },
1206
+ "32147": {
1207
+ "content": "<y45>",
1208
+ "lstrip": false,
1209
+ "normalized": true,
1210
+ "rstrip": false,
1211
+ "single_word": false,
1212
+ "special": false
1213
+ },
1214
+ "32148": {
1215
+ "content": "<y46>",
1216
+ "lstrip": false,
1217
+ "normalized": true,
1218
+ "rstrip": false,
1219
+ "single_word": false,
1220
+ "special": false
1221
+ },
1222
+ "32149": {
1223
+ "content": "<y47>",
1224
+ "lstrip": false,
1225
+ "normalized": true,
1226
+ "rstrip": false,
1227
+ "single_word": false,
1228
+ "special": false
1229
+ },
1230
+ "32150": {
1231
+ "content": "<y48>",
1232
+ "lstrip": false,
1233
+ "normalized": true,
1234
+ "rstrip": false,
1235
+ "single_word": false,
1236
+ "special": false
1237
+ },
1238
+ "32151": {
1239
+ "content": "<y49>",
1240
+ "lstrip": false,
1241
+ "normalized": true,
1242
+ "rstrip": false,
1243
+ "single_word": false,
1244
+ "special": false
1245
+ },
1246
+ "32152": {
1247
+ "content": "<y50>",
1248
+ "lstrip": false,
1249
+ "normalized": true,
1250
+ "rstrip": false,
1251
+ "single_word": false,
1252
+ "special": false
1253
+ },
1254
+ "32153": {
1255
+ "content": "<y51>",
1256
+ "lstrip": false,
1257
+ "normalized": true,
1258
+ "rstrip": false,
1259
+ "single_word": false,
1260
+ "special": false
1261
+ },
1262
+ "32154": {
1263
+ "content": "<y52>",
1264
+ "lstrip": false,
1265
+ "normalized": true,
1266
+ "rstrip": false,
1267
+ "single_word": false,
1268
+ "special": false
1269
+ },
1270
+ "32155": {
1271
+ "content": "<y53>",
1272
+ "lstrip": false,
1273
+ "normalized": true,
1274
+ "rstrip": false,
1275
+ "single_word": false,
1276
+ "special": false
1277
+ },
1278
+ "32156": {
1279
+ "content": "<y54>",
1280
+ "lstrip": false,
1281
+ "normalized": true,
1282
+ "rstrip": false,
1283
+ "single_word": false,
1284
+ "special": false
1285
+ },
1286
+ "32157": {
1287
+ "content": "<y55>",
1288
+ "lstrip": false,
1289
+ "normalized": true,
1290
+ "rstrip": false,
1291
+ "single_word": false,
1292
+ "special": false
1293
+ },
1294
+ "32158": {
1295
+ "content": "<y56>",
1296
+ "lstrip": false,
1297
+ "normalized": true,
1298
+ "rstrip": false,
1299
+ "single_word": false,
1300
+ "special": false
1301
+ },
1302
+ "32159": {
1303
+ "content": "<y57>",
1304
+ "lstrip": false,
1305
+ "normalized": true,
1306
+ "rstrip": false,
1307
+ "single_word": false,
1308
+ "special": false
1309
+ },
1310
+ "32160": {
1311
+ "content": "<y58>",
1312
+ "lstrip": false,
1313
+ "normalized": true,
1314
+ "rstrip": false,
1315
+ "single_word": false,
1316
+ "special": false
1317
+ },
1318
+ "32161": {
1319
+ "content": "<y59>",
1320
+ "lstrip": false,
1321
+ "normalized": true,
1322
+ "rstrip": false,
1323
+ "single_word": false,
1324
+ "special": false
1325
+ },
1326
+ "32162": {
1327
+ "content": "<y60>",
1328
+ "lstrip": false,
1329
+ "normalized": true,
1330
+ "rstrip": false,
1331
+ "single_word": false,
1332
+ "special": false
1333
+ },
1334
+ "32163": {
1335
+ "content": "<y61>",
1336
+ "lstrip": false,
1337
+ "normalized": true,
1338
+ "rstrip": false,
1339
+ "single_word": false,
1340
+ "special": false
1341
+ },
1342
+ "32164": {
1343
+ "content": "<y62>",
1344
+ "lstrip": false,
1345
+ "normalized": true,
1346
+ "rstrip": false,
1347
+ "single_word": false,
1348
+ "special": false
1349
+ },
1350
+ "32165": {
1351
+ "content": "<y63>",
1352
+ "lstrip": false,
1353
+ "normalized": true,
1354
+ "rstrip": false,
1355
+ "single_word": false,
1356
+ "special": false
1357
+ },
1358
+ "32166": {
1359
+ "content": "<y64>",
1360
+ "lstrip": false,
1361
+ "normalized": true,
1362
+ "rstrip": false,
1363
+ "single_word": false,
1364
+ "special": false
1365
+ },
1366
+ "32167": {
1367
+ "content": "<y65>",
1368
+ "lstrip": false,
1369
+ "normalized": true,
1370
+ "rstrip": false,
1371
+ "single_word": false,
1372
+ "special": false
1373
+ },
1374
+ "32168": {
1375
+ "content": "<y66>",
1376
+ "lstrip": false,
1377
+ "normalized": true,
1378
+ "rstrip": false,
1379
+ "single_word": false,
1380
+ "special": false
1381
+ },
1382
+ "32169": {
1383
+ "content": "<y67>",
1384
+ "lstrip": false,
1385
+ "normalized": true,
1386
+ "rstrip": false,
1387
+ "single_word": false,
1388
+ "special": false
1389
+ },
1390
+ "32170": {
1391
+ "content": "<y68>",
1392
+ "lstrip": false,
1393
+ "normalized": true,
1394
+ "rstrip": false,
1395
+ "single_word": false,
1396
+ "special": false
1397
+ },
1398
+ "32171": {
1399
+ "content": "<y69>",
1400
+ "lstrip": false,
1401
+ "normalized": true,
1402
+ "rstrip": false,
1403
+ "single_word": false,
1404
+ "special": false
1405
+ },
1406
+ "32172": {
1407
+ "content": "<y70>",
1408
+ "lstrip": false,
1409
+ "normalized": true,
1410
+ "rstrip": false,
1411
+ "single_word": false,
1412
+ "special": false
1413
+ },
1414
+ "32173": {
1415
+ "content": "<y71>",
1416
+ "lstrip": false,
1417
+ "normalized": true,
1418
+ "rstrip": false,
1419
+ "single_word": false,
1420
+ "special": false
1421
+ },
1422
+ "32174": {
1423
+ "content": "<y72>",
1424
+ "lstrip": false,
1425
+ "normalized": true,
1426
+ "rstrip": false,
1427
+ "single_word": false,
1428
+ "special": false
1429
+ },
1430
+ "32175": {
1431
+ "content": "<y73>",
1432
+ "lstrip": false,
1433
+ "normalized": true,
1434
+ "rstrip": false,
1435
+ "single_word": false,
1436
+ "special": false
1437
+ },
1438
+ "32176": {
1439
+ "content": "<y74>",
1440
+ "lstrip": false,
1441
+ "normalized": true,
1442
+ "rstrip": false,
1443
+ "single_word": false,
1444
+ "special": false
1445
+ },
1446
+ "32177": {
1447
+ "content": "<y75>",
1448
+ "lstrip": false,
1449
+ "normalized": true,
1450
+ "rstrip": false,
1451
+ "single_word": false,
1452
+ "special": false
1453
+ },
1454
+ "32178": {
1455
+ "content": "<y76>",
1456
+ "lstrip": false,
1457
+ "normalized": true,
1458
+ "rstrip": false,
1459
+ "single_word": false,
1460
+ "special": false
1461
+ },
1462
+ "32179": {
1463
+ "content": "<y77>",
1464
+ "lstrip": false,
1465
+ "normalized": true,
1466
+ "rstrip": false,
1467
+ "single_word": false,
1468
+ "special": false
1469
+ },
1470
+ "32180": {
1471
+ "content": "<y78>",
1472
+ "lstrip": false,
1473
+ "normalized": true,
1474
+ "rstrip": false,
1475
+ "single_word": false,
1476
+ "special": false
1477
+ },
1478
+ "32181": {
1479
+ "content": "<y79>",
1480
+ "lstrip": false,
1481
+ "normalized": true,
1482
+ "rstrip": false,
1483
+ "single_word": false,
1484
+ "special": false
1485
+ },
1486
+ "32182": {
1487
+ "content": "<y80>",
1488
+ "lstrip": false,
1489
+ "normalized": true,
1490
+ "rstrip": false,
1491
+ "single_word": false,
1492
+ "special": false
1493
+ },
1494
+ "32183": {
1495
+ "content": "<y81>",
1496
+ "lstrip": false,
1497
+ "normalized": true,
1498
+ "rstrip": false,
1499
+ "single_word": false,
1500
+ "special": false
1501
+ },
1502
+ "32184": {
1503
+ "content": "<y82>",
1504
+ "lstrip": false,
1505
+ "normalized": true,
1506
+ "rstrip": false,
1507
+ "single_word": false,
1508
+ "special": false
1509
+ },
1510
+ "32185": {
1511
+ "content": "<y83>",
1512
+ "lstrip": false,
1513
+ "normalized": true,
1514
+ "rstrip": false,
1515
+ "single_word": false,
1516
+ "special": false
1517
+ },
1518
+ "32186": {
1519
+ "content": "<y84>",
1520
+ "lstrip": false,
1521
+ "normalized": true,
1522
+ "rstrip": false,
1523
+ "single_word": false,
1524
+ "special": false
1525
+ },
1526
+ "32187": {
1527
+ "content": "<y85>",
1528
+ "lstrip": false,
1529
+ "normalized": true,
1530
+ "rstrip": false,
1531
+ "single_word": false,
1532
+ "special": false
1533
+ },
1534
+ "32188": {
1535
+ "content": "<y86>",
1536
+ "lstrip": false,
1537
+ "normalized": true,
1538
+ "rstrip": false,
1539
+ "single_word": false,
1540
+ "special": false
1541
+ },
1542
+ "32189": {
1543
+ "content": "<y87>",
1544
+ "lstrip": false,
1545
+ "normalized": true,
1546
+ "rstrip": false,
1547
+ "single_word": false,
1548
+ "special": false
1549
+ },
1550
+ "32190": {
1551
+ "content": "<y88>",
1552
+ "lstrip": false,
1553
+ "normalized": true,
1554
+ "rstrip": false,
1555
+ "single_word": false,
1556
+ "special": false
1557
+ },
1558
+ "32191": {
1559
+ "content": "<y89>",
1560
+ "lstrip": false,
1561
+ "normalized": true,
1562
+ "rstrip": false,
1563
+ "single_word": false,
1564
+ "special": false
1565
+ },
1566
+ "32192": {
1567
+ "content": "<y90>",
1568
+ "lstrip": false,
1569
+ "normalized": true,
1570
+ "rstrip": false,
1571
+ "single_word": false,
1572
+ "special": false
1573
+ },
1574
+ "32193": {
1575
+ "content": "<y91>",
1576
+ "lstrip": false,
1577
+ "normalized": true,
1578
+ "rstrip": false,
1579
+ "single_word": false,
1580
+ "special": false
1581
+ },
1582
+ "32194": {
1583
+ "content": "<y92>",
1584
+ "lstrip": false,
1585
+ "normalized": true,
1586
+ "rstrip": false,
1587
+ "single_word": false,
1588
+ "special": false
1589
+ },
1590
+ "32195": {
1591
+ "content": "<y93>",
1592
+ "lstrip": false,
1593
+ "normalized": true,
1594
+ "rstrip": false,
1595
+ "single_word": false,
1596
+ "special": false
1597
+ },
1598
+ "32196": {
1599
+ "content": "<y94>",
1600
+ "lstrip": false,
1601
+ "normalized": true,
1602
+ "rstrip": false,
1603
+ "single_word": false,
1604
+ "special": false
1605
+ },
1606
+ "32197": {
1607
+ "content": "<y95>",
1608
+ "lstrip": false,
1609
+ "normalized": true,
1610
+ "rstrip": false,
1611
+ "single_word": false,
1612
+ "special": false
1613
+ },
1614
+ "32198": {
1615
+ "content": "<y96>",
1616
+ "lstrip": false,
1617
+ "normalized": true,
1618
+ "rstrip": false,
1619
+ "single_word": false,
1620
+ "special": false
1621
+ },
1622
+ "32199": {
1623
+ "content": "<y97>",
1624
+ "lstrip": false,
1625
+ "normalized": true,
1626
+ "rstrip": false,
1627
+ "single_word": false,
1628
+ "special": false
1629
+ },
1630
+ "32200": {
1631
+ "content": "<y98>",
1632
+ "lstrip": false,
1633
+ "normalized": true,
1634
+ "rstrip": false,
1635
+ "single_word": false,
1636
+ "special": false
1637
+ },
1638
+ "32201": {
1639
+ "content": "<y99>",
1640
+ "lstrip": false,
1641
+ "normalized": true,
1642
+ "rstrip": false,
1643
+ "single_word": false,
1644
+ "special": false
1645
+ },
1646
+ "32202": {
1647
+ "content": "<box>",
1648
+ "lstrip": false,
1649
+ "normalized": true,
1650
+ "rstrip": false,
1651
+ "single_word": false,
1652
+ "special": false
1653
+ },
1654
+ "32203": {
1655
+ "content": "</box>",
1656
+ "lstrip": false,
1657
+ "normalized": true,
1658
+ "rstrip": false,
1659
+ "single_word": false,
1660
+ "special": false
1661
+ },
1662
+ "32204": {
1663
+ "content": "<image>",
1664
+ "lstrip": false,
1665
+ "normalized": true,
1666
+ "rstrip": false,
1667
+ "single_word": false,
1668
+ "special": false
1669
+ },
1670
+ "32205": {
1671
+ "content": "<prev_im>",
1672
+ "lstrip": false,
1673
+ "normalized": true,
1674
+ "rstrip": false,
1675
+ "single_word": false,
1676
+ "special": false
1677
+ },
1678
+ "32206": {
1679
+ "content": "<lat_image>",
1680
+ "lstrip": false,
1681
+ "normalized": true,
1682
+ "rstrip": false,
1683
+ "single_word": false,
1684
+ "special": false
1685
+ }
1686
+ },
1687
+ "bos_token": "<s>",
1688
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}You are an expert radiology assistant tasked with interpreting a chest X-ray study. {% for message in messages %}{% if message[\"role\"] == \"user\" %}USER: {% else %}ASSISTANT: {% endif %}{% for item in message[\"content\"] %}{% if item[\"type\"] == \"text\" %}{{ item[\"text\"] }}{% elif item[\"type\"] == \"image\" %}<image>{% endif %}{% endfor %}{% if message[\"role\"] == \"user\" %} {% else %}{{eos_token}}{% endif %}{% endfor %}{% if add_generation_prompt %}ASSISTANT: {% endif %}",
1689
+ "clean_up_tokenization_spaces": false,
1690
+ "eos_token": "</s>",
1691
+ "legacy": false,
1692
+ "model_max_length": 4096,
1693
+ "pad_token": "<unk>",
1694
+ "padding_side": "left",
1695
+ "processor_class": "Maira2Processor",
1696
+ "sp_model_kwargs": {},
1697
+ "spaces_between_special_tokens": false,
1698
+ "tokenizer_class": "LlamaTokenizer",
1699
+ "unk_token": "<unk>",
1700
+ "use_default_system_prompt": false
1701
+ }