gnedivad commited on
Commit
d4394af
·
verified ·
1 Parent(s): 6051590

Upload folder using huggingface_hub

Browse files
LICENSE ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MICROSOFT RESEARCH LICENSE TERMS
2
+ IF YOU LIVE IN THE UNITED STATES, PLEASE READ THE “BINDING ARBITRATION AND CLASS ACTION WAIVER” SECTION BELOW. IT AFFECTS HOW DISPUTES ARE RESOLVED.
3
+ These license terms are an agreement between you and Microsoft Corporation (or one of its affiliates). They apply to the source code, object code, machine learning models, or data (collectively “Materials”) that accompany this license. IF YOU COMPLY WITH THESE LICENSE TERMS, YOU HAVE THE RIGHTS BELOW. BY USING THE MATERIALS, YOU ACCEPT THESE TERMS.
4
+ 1) INSTALLATION AND USE RIGHTS TO THE MATERIALS.
5
+ Subject to the terms of this agreement, you have the below rights, if applicable, to use the Materials solely for non-commercial, non-revenue generating, research purposes:
6
+ a) Source Code. If source code is included, you may use and modify the source code, but you may not distribute the source code.
7
+ b) Object Code. If object code is included, you may use the object code, but you may not distribute the object code.
8
+ c) Models. If machine learning model(s) are included, you may use the model(s), but you may not distribute the models.
9
+ d) Data. If data is included, you may use and modify the data, but your use and modification must be consistent with the consent under which the data was provided and/or gathered and you may not distribute the data or your modifications to the data.
10
+ 2) SCOPE OF LICENSE. The Materials are licensed, not sold. Microsoft reserves all other rights. Unless applicable law gives you more rights despite this limitation, you will not (and have no right to):
11
+ a) work around any technical limitations in the Materials that only allow you to use it in certain ways;
12
+ b) reverse engineer, decompile or disassemble the Materials;
13
+ c) remove, minimize, block, or modify any notices of Microsoft or its suppliers in the Materials;
14
+ d) use the Materials in any way that is against the law or to create or propagate malware; or
15
+ e) share, publish, distribute or lend the Materials, provide the Materials as a stand-alone hosted solution for others to use, or transfer the Materials or this agreement to any third party.
16
+ 3) PERSONAL DATA. If the data (set forth in Section 1(c) above) includes or is found to include any data that enables any ability to identify an individual (“Personal Data”), you will not use such Personal Data for any purpose other than was authorized and consented to by the data subject/research participant. You will not use Personal Data to contact any person. You will keep Personal Data in strict confidence. You will not share any Personal Data that is collected or in your possession with any third party for any reason and as required under the original consent agreement. Further, you will destroy the Personal Data and any backup or copies, immediately upon the completion of your research.
17
+ 4) LICENSE TO MICROSOFT. Notwithstanding the limitations in Section 1, you may distribute your modifications back to Microsoft, and if you do provide Microsoft with modifications of the Materials, you hereby grant Microsoft, without any restrictions or limitations, a non-exclusive, perpetual, irrevocable, royalty-free, assignable and sub-licensable license, to reproduce, publicly perform or display, install, use, modify, post, distribute, make and have made, sell and transfer such modifications and derivatives for any purpose.
18
+ 5) PUBLICATION. You may publish (or present papers or articles) on your results from using the Materials provided that no material or substantial portion of the Materials is included in any such publication or presentation.
19
+ 6) FEEDBACK. Any feedback about the Materials provided by you to us is voluntarily given, and Microsoft shall be free to use the feedback as it sees fit without obligation or restriction of any kind, even if the feedback is designated by you as confidential. Such feedback shall be considered a contribution and licensed to Microsoft under the terms of Section 4 above.
20
+ 7) COMPLIANCE WITH TRADE LAWS. You acknowledge that the Materials may be subject to applicable trade laws in one or more countries.  You will comply with all relevant laws and regulations applicable to the import or export of the Materials, including but not limited to, trade laws such as the U.S. Export Administration Regulations or other end-user, end use, and destination restrictions by the U.S. and other governments, as well as sanctions regulations administered by the U.S. Office of Foreign Assets Control. Microsoft may suspend or terminate the agreement immediately to the extent that Microsoft reasonably concludes that continued performance would violate trade laws or put it at risk of becoming subject to sanctions or penalties under trade laws. For additional information, see www.microsoft.com/exporting.
21
+ 8) SUPPORT SERVICES. Microsoft is not obligated under this agreement to provide any support services for the Materials. Any support provided is “as is”, “with all faults”, and without warranty of any kind.
22
+ 9) BINDING ARBITRATION AND CLASS ACTION WAIVER. This Section applies if you live in (or, if a business, your principal place of business is in) the United States. If you and Microsoft have a dispute, you and Microsoft agree to try for 60 days to resolve it informally. If you and Microsoft can’t, you and Microsoft agree to binding individual arbitration before the American Arbitration Association under the Federal Arbitration Act (“FAA”), and not to sue in court in front of a judge or jury. Instead, a neutral arbitrator will decide. Class action lawsuits, class-wide arbitrations, private attorney-general actions, and any other proceeding where someone acts in a representative capacity are not allowed; nor is combining individual proceedings without the consent of all parties. The complete Arbitration Agreement contains more terms and is at aka.ms/arb-agreement-1. You and Microsoft agree to these terms.
23
+ 10) ENTIRE AGREEMENT. This agreement, and any other terms Microsoft may provide for supplements, updates, or third-party applications, is the entire agreement for the Materials.
24
+ 11) APPLICABLE LAW AND PLACE TO RESOLVE DISPUTES. If you acquired the Materials in the United States or Canada, the laws of the state or province where you live (or, if a business, where your principal place of business is located) govern the interpretation of this agreement, claims for its breach, and all other claims (including consumer protection, unfair competition, and tort claims), regardless of conflict of laws principles, except that the FAA governs everything related to arbitration. If you acquired the Materials in any other country, its laws apply, except that the FAA governs everything related to arbitration. If U.S. federal jurisdiction exists, you and Microsoft consent to exclusive jurisdiction and venue in the federal court in King County, Washington for all disputes heard in court (excluding arbitration). If not, you and Microsoft consent to exclusive jurisdiction and venue in the Superior Court of King County, Washington for all disputes heard in court (excluding arbitration).
25
+ 12) CONSUMER RIGHTS; REGIONAL VARIATIONS. This agreement describes certain legal rights. You may have other rights, including consumer rights, under the laws of your state, province, or country. Separate and apart from your relationship with Microsoft, you may also have rights with respect to the party from which you acquired the Materials. This agreement does not change those other rights if the laws of your state, province, or country do not permit it to do so. For example, if you acquired the Materials in one of the below regions, or mandatory country law applies, then the following provisions apply to you:
26
+ a) Australia. You have statutory guarantees under the Australian Consumer Law and nothing in this agreement is intended to affect those rights.
27
+ b) Canada. If you acquired this software in Canada, you may stop receiving updates by turning off the automatic update feature, disconnecting your device from the Internet (if and when you re-connect to the Internet, however, the Materials will resume checking for and installing updates), or uninstalling the Materials. The product documentation, if any, may also specify how to turn off updates for your specific device or software.
28
+ c) Germany and Austria.
29
+ i. Warranty. The properly licensed software will perform substantially as described in any Microsoft materials that accompany the Materials. However, Microsoft gives no contractual guarantee in relation to the licensed software.
30
+ ii. Limitation of Liability. In case of intentional conduct, gross negligence, claims based on the Product Liability Act, as well as, in case of death or personal or physical injury, Microsoft is liable according to the statutory law.
31
+ Subject to the foregoing clause (ii), Microsoft will only be liable for slight negligence if Microsoft is in breach of such material contractual obligations, the fulfillment of which facilitate the due performance of this agreement, the breach of which would endanger the purpose of this agreement and the compliance with which a party may constantly trust in (so-called "cardinal obligations"). In other cases of slight negligence, Microsoft will not be liable for slight negligence.
32
+ 13) DISCLAIMER OF WARRANTY. THE MATERIALS ARE LICENSED “AS IS.” YOU BEAR THE RISK OF USING THEM. MICROSOFT GIVES NO EXPRESS WARRANTIES, GUARANTEES, OR CONDITIONS. TO THE EXTENT PERMITTED UNDER APPLICABLE LAWS, MICROSOFT EXCLUDES ALL IMPLIED WARRANTIES, INCLUDING MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT.
33
+
34
+ 14) LIMITATION ON AND EXCLUSION OF DAMAGES. IF YOU HAVE ANY BASIS FOR RECOVERING DAMAGES DESPITE THE PRECEDING DISCLAIMER OF WARRANTY, YOU CAN RECOVER FROM MICROSOFT AND ITS SUPPLIERS ONLY DIRECT DAMAGES UP TO U.S. $5.00. YOU CANNOT RECOVER ANY OTHER DAMAGES, INCLUDING CONSEQUENTIAL, LOST PROFITS, SPECIAL, INDIRECT OR INCIDENTAL DAMAGES.
35
+ This limitation applies to (a) anything related to the Materials, services, content (including code) on third party Internet sites, or third party applications; and (b) claims for breach of contract, warranty, guarantee, or condition; strict liability, negligence, or other tort; or any other claim; in each case to the extent permitted by applicable law.
36
+ It also applies even if Microsoft knew or should have known about the possibility of the damages. The above limitation or exclusion may not apply to you because your state, province, or country may not allow the exclusion or limitation of incidental, consequential, or other damages.
README.md ADDED
@@ -0,0 +1,273 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: msrla
4
+ license_link: https://huggingface.co/microsoft/maira-2/blob/main/LICENSE
5
+ library_name: transformers
6
+ extra_gated_prompt: >-
7
+ Please confirm that you have read and agree to the following disclaimer.
8
+
9
+ The model(s) and/or software described in this repository are provided for research and development use only. The model(s) and/or software are not intended for use in clinical decision-making or for any other clinical use, and performance for clinical use has not been established. You bear sole responsibility for any use of these model(s) and/or software, including incorporation into any product intended for clinical use.
10
+ extra_gated_fields:
11
+ I have read and agree to the disclaimer: checkbox
12
+ ---
13
+
14
+ # Model Card for MAIRA-2
15
+
16
+ <!-- Provide a quick summary of what the model is/does. -->
17
+
18
+ MAIRA-2 is a multimodal transformer designed for the generation of grounded or non-grounded radiology reports from chest X-rays. It is described in more detail in [MAIRA-2: Grounded Radiology Report Generation (S. Bannur, K. Bouzid et al., 2024)](https://arxiv.org/abs/2406.04449). MAIRA-2 has been built for research purposes only and is being shared to facilitate comparison and further research.
19
+
20
+ ## Model Details
21
+
22
+ ### Model Description
23
+
24
+ <!-- Provide a longer summary of what this model is. -->
25
+ MAIRA-2 is composed of the image encoder [RAD-DINO-MAIRA-2](https://huggingface.co/microsoft/rad-dino-maira-2) (used frozen), a projection layer (trained from scratch), and the language model [vicuna-7b-v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5) (fully fine-tuned).
26
+
27
+ - **Developed by:** Microsoft Research Health Futures
28
+ - **Model type:** Multimodal transformer
29
+ - **Language(s) (NLP):** English
30
+ - **License:** [MSRLA](./LICENSE)
31
+ - **Finetuned from model [optional]:** [vicuna-7b-1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5), [RAD-DINO-MAIRA-2](https://huggingface.co/microsoft/rad-dino-maira-2)
32
+
33
+ ## Uses
34
+
35
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
36
+ MAIRA-2 is shared for research purposes only. It is **not meant to be used for clinical practice.** MAIRA-2 was not extensively tested for its capabilities and properties, including its accuracy and reliability in application settings, fairness across different demographics and uses, and security and privacy.
37
+
38
+ ### Direct Use
39
+
40
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
41
+
42
+ As inputs, MAIRA-2 takes a frontal chest X-ray, and any of the following:
43
+ - A lateral view from the current study
44
+ - A frontal view from the *prior* study, with accompanying prior report
45
+ - The indication for the current study
46
+ - The technique and comparison sections for the current study
47
+
48
+ MAIRA-2 can generate the _findings_ section of the current study, in one of two forms:
49
+ - Narrative text, without any image annotations (this is the typical report generation scenario).
50
+ - As a grounded report, wherein all described findings are accompanied by zero or more bounding boxes indicating their location on the current frontal image.
51
+
52
+ MAIRA-2 can also perform phrase grounding. In this case, it must also be provided with an input phrase. It will then repeat the phrase and generate a bounding box localising the finding described in the phrase.
53
+
54
+ These use-cases are illustrated with [sample code below](README.md#use-case-1-and-2-findings-generation-with-or-without-grounding).
55
+
56
+ ### Out-of-Scope Use
57
+
58
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
59
+
60
+ MAIRA-2 was trained on chest X-rays from adults with English language reports only, and is not expected to work on any other imaging modality or anatomy. Variations in the input prompt (e.g. changing the instruction) are likely to degrade performance, as this model was *not* optimised for arbitrary user inputs.
61
+
62
+ As above, this is a research model which should not be used in any real clinical or production scenario.
63
+
64
+ ## Bias, Risks, and Limitations
65
+
66
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
67
+
68
+ ### Data biases
69
+ MAIRA-2 was trained on chest X-ray report datasets from Spain (translated from the original Spanish to English) and the USA, listed below. Reporting styles, patient demographics and disease prevalence, and image acquisition protocols can vary across health systems and regions. These factors will impact the generalisability of the model.
70
+
71
+ ### Model errors (fabrication, omission)
72
+
73
+ This model does not perform perfectly on its tasks, as outlined in more detail in the [MAIRA-2 report](https://arxiv.org/abs/2406.04449). Hence, errors can be present in the generated (grounded) reports.
74
+
75
+ ## How to Get Started with the Model
76
+
77
+ We demonstrate below how to run inference with MAIRA-2 for its three capabilities: findings generation with and without grounding, and phrase grounding.
78
+
79
+ ### Setup
80
+
81
+ To run this sample code, you will need the following packages:
82
+ ```
83
+ pillow
84
+ protobuf
85
+ sentencepiece
86
+ torch
87
+ transformers>=4.48.0,<4.52
88
+ ```
89
+
90
+ Note: MAIRA-2 has last been tested with transformers v4.51.3.
91
+
92
+ First, initialise the model and put it in eval mode.
93
+ ```python
94
+ from transformers import AutoModelForCausalLM, AutoProcessor
95
+ from pathlib import Path
96
+ import torch
97
+
98
+ model = AutoModelForCausalLM.from_pretrained("microsoft/maira-2", trust_remote_code=True)
99
+ processor = AutoProcessor.from_pretrained("microsoft/maira-2", trust_remote_code=True)
100
+
101
+ device = torch.device("cuda")
102
+ model = model.eval()
103
+ model = model.to(device)
104
+ ```
105
+
106
+ We need to get some data to demonstrate the forward pass.
107
+ For this example, we'll collect an example from the IU X-ray dataset, which has a permissive license.
108
+
109
+ ```python
110
+ import requests
111
+ from PIL import Image
112
+
113
+ def get_sample_data() -> dict[str, Image.Image | str]:
114
+ """
115
+ Download chest X-rays from IU-Xray, which we didn't train MAIRA-2 on. License is CC.
116
+ We modified this function from the Rad-DINO repository on Huggingface.
117
+ """
118
+ frontal_image_url = "https://openi.nlm.nih.gov/imgs/512/145/145/CXR145_IM-0290-1001.png"
119
+ lateral_image_url = "https://openi.nlm.nih.gov/imgs/512/145/145/CXR145_IM-0290-2001.png"
120
+
121
+ def download_and_open(url: str) -> Image.Image:
122
+ response = requests.get(url, headers={"User-Agent": "MAIRA-2"}, stream=True)
123
+ return Image.open(response.raw)
124
+
125
+ frontal_image = download_and_open(frontal_image_url)
126
+ lateral_image = download_and_open(lateral_image_url)
127
+
128
+ sample_data = {
129
+ "frontal": frontal_image,
130
+ "lateral": lateral_image,
131
+ "indication": "Dyspnea.",
132
+ "comparison": "None.",
133
+ "technique": "PA and lateral views of the chest.",
134
+ "phrase": "Pleural effusion." # For the phrase grounding example. This patient has pleural effusion.
135
+ }
136
+ return sample_data
137
+
138
+ sample_data = get_sample_data()
139
+ ```
140
+
141
+ ### Use-case 1 and 2: Findings generation with or without grounding
142
+
143
+ We can toggle whether MAIRA-2 generates a grounded report based on how we preprocess the inputs, as it uses a different prompt. Let's start without grounding (`get_grounding=False`). While generating, for non-grounded reporting use `max_new_tokens=300`, and for grounded reporting use `max_new_tokens=450` to accommodate additional box and object tokens.
144
+ ```python
145
+ processed_inputs = processor.format_and_preprocess_reporting_input(
146
+ current_frontal=sample_data["frontal"],
147
+ current_lateral=sample_data["lateral"],
148
+ prior_frontal=None, # Our example has no prior
149
+ indication=sample_data["indication"],
150
+ technique=sample_data["technique"],
151
+ comparison=sample_data["comparison"],
152
+ prior_report=None, # Our example has no prior
153
+ return_tensors="pt",
154
+ get_grounding=False, # For this example we generate a non-grounded report
155
+ )
156
+
157
+ processed_inputs = processed_inputs.to(device)
158
+ with torch.no_grad():
159
+ output_decoding = model.generate(
160
+ **processed_inputs,
161
+ max_new_tokens=300, # Set to 450 for grounded reporting
162
+ use_cache=True,
163
+ )
164
+ prompt_length = processed_inputs["input_ids"].shape[-1]
165
+ decoded_text = processor.decode(output_decoding[0][prompt_length:], skip_special_tokens=True)
166
+ decoded_text = decoded_text.lstrip() # Findings generation completions have a single leading space
167
+ prediction = processor.convert_output_to_plaintext_or_grounded_sequence(decoded_text)
168
+ print("Parsed prediction:", prediction)
169
+ ```
170
+
171
+ We get something that looks like this:
172
+ > There is a large right pleural effusion with associated right basilar atelectasis. The left lung is clear. No pneumothorax is identified. The cardiomediastinal silhouette and hilar contours are normal. There is no free air under the diaphragm. Surgical clips are noted in the right upper quadrant of the abdomen.
173
+
174
+ If we had set `get_grounding=True`, MAIRA-2 would generate a grounded report. For this example, that looks like this:
175
+
176
+ ```python
177
+ ('There is a large right pleural effusion.', [(0.055, 0.275, 0.445, 0.665)]),
178
+ ('The left lung is clear.', None),
179
+ ('No pneumothorax is identified.', None),
180
+ ('The cardiomediastinal silhouette is within normal limits.', None),
181
+ ('The visualized osseous structures are unremarkable.', None)
182
+ ```
183
+
184
+ The generated bounding box coordinates are the `(x, y)` coordinates of the top left and bottom right corners of the box, e.g. `(x_topleft, y_topleft, x_bottomright, y_bottomright)`. These are relative to the _cropped_ image (that is, the image that MAIRA-2 ultimately got as input), so be careful while visualising. The processor provides a method `adjust_box_for_original_image_size` to get boxes relative to the original image shape.
185
+
186
+ Note that MAIRA-2 generates slightly different reports for grounded and non-grounded reporting scenarios, a side-effect of its grounded reporting training data coming from a different data distribution.
187
+
188
+ ### Use-case 3: Phrase Grounding
189
+
190
+ Here the input is different as we provide the model with a phrase to ground in the image. Recall (`get_sample_data`) that our phrase here is just "Pleural effusion", which we already know is present in this image.
191
+
192
+ ```python
193
+ processed_inputs = processor.format_and_preprocess_phrase_grounding_input(
194
+ frontal_image=sample_data["frontal"],
195
+ phrase=sample_data["phrase"],
196
+ return_tensors="pt",
197
+ )
198
+
199
+ processed_inputs = processed_inputs.to(device)
200
+ with torch.no_grad():
201
+ output_decoding = model.generate(
202
+ **processed_inputs,
203
+ max_new_tokens=150,
204
+ use_cache=True,
205
+ )
206
+ prompt_length = processed_inputs["input_ids"].shape[-1]
207
+ decoded_text = processor.decode(output_decoding[0][prompt_length:], skip_special_tokens=True)
208
+ prediction = processor.convert_output_to_plaintext_or_grounded_sequence(decoded_text)
209
+
210
+ print("Parsed prediction:", prediction)
211
+ ```
212
+
213
+ This gives us something like this:
214
+
215
+ ```python
216
+ ('Pleural effusion.', [(0.025, 0.345, 0.425, 0.575)])
217
+ ```
218
+
219
+ Again, as for grounded reporting we must remember the bbox coordinates are relative to the cropped image seen by MAIRA-2, use `processor.adjust_box_for_original_image_size` to get boxes adjusted for the original image shape.
220
+
221
+ ## Training details
222
+
223
+ We did not originally train MAIRA-2 using the exact model class provided here, however we have checked that its behaviour is the same. We provide this class to facilitate research re-use and inference.
224
+
225
+ ### Training data
226
+
227
+ MAIRA-2 was trained on a mix of public and private chest X-ray datasets. Each example comprises one or more CXR images and associated report text, with or without grounding (spatial annotations). The model is trained to generate the _findings_ section of the report, with or without grounding.
228
+
229
+ | Dataset | Country | # examples (ungrounded) | # examples (grounded) |
230
+ | ----- | ------ |------- | ----- |
231
+ | [MIMIC-CXR](https://www.nature.com/articles/s41597-019-0322-0) | USA | 55 218 | 595* |
232
+ | [PadChest](https://www.sciencedirect.com/science/article/abs/pii/S1361841520301614) | Spain | 52 828 | 3 122 |
233
+ | USMix (Private) | USA | 118 031 | 53 613 |
234
+
235
+ *We use the [MS-CXR](https://physionet.org/content/ms-cxr/) phrase grounding dataset to provide `grounding' examples from MIMIC-CXR.
236
+
237
+ ## Environmental Impact
238
+
239
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
240
+
241
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
242
+
243
+ - **Hardware Type:** NVIDIA A100 GPUs
244
+ - **Hours used:** 1432
245
+ - **Cloud Provider:** Azure
246
+ - **Compute Region:** West US 2
247
+ - **Carbon Emitted:** 107.4 CO₂ eq _(ostensibly offset by this provider)_
248
+
249
+ ## Citation
250
+
251
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
252
+
253
+ **BibTeX:**
254
+
255
+ ```
256
+ @article{Bannur2024MAIRA2GR,
257
+ title={MAIRA-2: Grounded Radiology Report Generation},
258
+ author={Shruthi Bannur and Kenza Bouzid and Daniel C. Castro and Anton Schwaighofer and Anja Thieme and Sam Bond-Taylor and Maximilian Ilse and Fernando P\'{e}rez-Garc\'{i}a and Valentina Salvatelli and Harshita Sharma and Felix Meissen and Mercy Prasanna Ranjit and Shaury Srivastav and Julia Gong and Noel C. F. Codella and Fabian Falck and Ozan Oktay and Matthew P. Lungren and Maria T. A. Wetscherek and Javier Alvarez-Valle and Stephanie L. Hyland},
259
+ journal={arXiv},
260
+ year={2024},
261
+ volume={abs/2406.04449},
262
+ url={https://arxiv.org/abs/2406.04449}
263
+ }
264
+ ```
265
+
266
+ **APA:**
267
+
268
+ > Bannur*, S., Bouzid*, K., Castro, D. C., Schwaighofer, A., Thieme, A., Bond-Taylor, S., Ilse, M., Pérez-García, F., Salvatelli, V., Sharma, H., Meissen, F., Ranjit, M.P., Srivastav, S., Gong, J., Codella, N.C.F., Falck, F., Oktay, O., Lungren, M.P., Wetscherek, M.T., Alvarez-Valle, J., & Hyland, S. L. (2024). *MAIRA-2: Grounded Radiology Report Generation*. arXiv preprint abs/2406.04449.
269
+
270
+ ## Model Card Contact
271
+
272
+ - Stephanie Hyland ([`stephanie.hyland@microsoft.com`](mailto:stephanie.hyland@microsoft.com))
273
+ - Shruthi Bannur ([`shruthi.bannur@microsoft.com`](mailto:shruthi.bannur@microsoft.com))
added_tokens.json ADDED
@@ -0,0 +1,209 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</box>": 32203,
3
+ "</obj>": 32001,
4
+ "<box>": 32202,
5
+ "<image>": 32204,
6
+ "<lat_image>": 32206,
7
+ "<obj>": 32000,
8
+ "<prev_im>": 32205,
9
+ "<x0>": 32002,
10
+ "<x10>": 32012,
11
+ "<x11>": 32013,
12
+ "<x12>": 32014,
13
+ "<x13>": 32015,
14
+ "<x14>": 32016,
15
+ "<x15>": 32017,
16
+ "<x16>": 32018,
17
+ "<x17>": 32019,
18
+ "<x18>": 32020,
19
+ "<x19>": 32021,
20
+ "<x1>": 32003,
21
+ "<x20>": 32022,
22
+ "<x21>": 32023,
23
+ "<x22>": 32024,
24
+ "<x23>": 32025,
25
+ "<x24>": 32026,
26
+ "<x25>": 32027,
27
+ "<x26>": 32028,
28
+ "<x27>": 32029,
29
+ "<x28>": 32030,
30
+ "<x29>": 32031,
31
+ "<x2>": 32004,
32
+ "<x30>": 32032,
33
+ "<x31>": 32033,
34
+ "<x32>": 32034,
35
+ "<x33>": 32035,
36
+ "<x34>": 32036,
37
+ "<x35>": 32037,
38
+ "<x36>": 32038,
39
+ "<x37>": 32039,
40
+ "<x38>": 32040,
41
+ "<x39>": 32041,
42
+ "<x3>": 32005,
43
+ "<x40>": 32042,
44
+ "<x41>": 32043,
45
+ "<x42>": 32044,
46
+ "<x43>": 32045,
47
+ "<x44>": 32046,
48
+ "<x45>": 32047,
49
+ "<x46>": 32048,
50
+ "<x47>": 32049,
51
+ "<x48>": 32050,
52
+ "<x49>": 32051,
53
+ "<x4>": 32006,
54
+ "<x50>": 32052,
55
+ "<x51>": 32053,
56
+ "<x52>": 32054,
57
+ "<x53>": 32055,
58
+ "<x54>": 32056,
59
+ "<x55>": 32057,
60
+ "<x56>": 32058,
61
+ "<x57>": 32059,
62
+ "<x58>": 32060,
63
+ "<x59>": 32061,
64
+ "<x5>": 32007,
65
+ "<x60>": 32062,
66
+ "<x61>": 32063,
67
+ "<x62>": 32064,
68
+ "<x63>": 32065,
69
+ "<x64>": 32066,
70
+ "<x65>": 32067,
71
+ "<x66>": 32068,
72
+ "<x67>": 32069,
73
+ "<x68>": 32070,
74
+ "<x69>": 32071,
75
+ "<x6>": 32008,
76
+ "<x70>": 32072,
77
+ "<x71>": 32073,
78
+ "<x72>": 32074,
79
+ "<x73>": 32075,
80
+ "<x74>": 32076,
81
+ "<x75>": 32077,
82
+ "<x76>": 32078,
83
+ "<x77>": 32079,
84
+ "<x78>": 32080,
85
+ "<x79>": 32081,
86
+ "<x7>": 32009,
87
+ "<x80>": 32082,
88
+ "<x81>": 32083,
89
+ "<x82>": 32084,
90
+ "<x83>": 32085,
91
+ "<x84>": 32086,
92
+ "<x85>": 32087,
93
+ "<x86>": 32088,
94
+ "<x87>": 32089,
95
+ "<x88>": 32090,
96
+ "<x89>": 32091,
97
+ "<x8>": 32010,
98
+ "<x90>": 32092,
99
+ "<x91>": 32093,
100
+ "<x92>": 32094,
101
+ "<x93>": 32095,
102
+ "<x94>": 32096,
103
+ "<x95>": 32097,
104
+ "<x96>": 32098,
105
+ "<x97>": 32099,
106
+ "<x98>": 32100,
107
+ "<x99>": 32101,
108
+ "<x9>": 32011,
109
+ "<y0>": 32102,
110
+ "<y10>": 32112,
111
+ "<y11>": 32113,
112
+ "<y12>": 32114,
113
+ "<y13>": 32115,
114
+ "<y14>": 32116,
115
+ "<y15>": 32117,
116
+ "<y16>": 32118,
117
+ "<y17>": 32119,
118
+ "<y18>": 32120,
119
+ "<y19>": 32121,
120
+ "<y1>": 32103,
121
+ "<y20>": 32122,
122
+ "<y21>": 32123,
123
+ "<y22>": 32124,
124
+ "<y23>": 32125,
125
+ "<y24>": 32126,
126
+ "<y25>": 32127,
127
+ "<y26>": 32128,
128
+ "<y27>": 32129,
129
+ "<y28>": 32130,
130
+ "<y29>": 32131,
131
+ "<y2>": 32104,
132
+ "<y30>": 32132,
133
+ "<y31>": 32133,
134
+ "<y32>": 32134,
135
+ "<y33>": 32135,
136
+ "<y34>": 32136,
137
+ "<y35>": 32137,
138
+ "<y36>": 32138,
139
+ "<y37>": 32139,
140
+ "<y38>": 32140,
141
+ "<y39>": 32141,
142
+ "<y3>": 32105,
143
+ "<y40>": 32142,
144
+ "<y41>": 32143,
145
+ "<y42>": 32144,
146
+ "<y43>": 32145,
147
+ "<y44>": 32146,
148
+ "<y45>": 32147,
149
+ "<y46>": 32148,
150
+ "<y47>": 32149,
151
+ "<y48>": 32150,
152
+ "<y49>": 32151,
153
+ "<y4>": 32106,
154
+ "<y50>": 32152,
155
+ "<y51>": 32153,
156
+ "<y52>": 32154,
157
+ "<y53>": 32155,
158
+ "<y54>": 32156,
159
+ "<y55>": 32157,
160
+ "<y56>": 32158,
161
+ "<y57>": 32159,
162
+ "<y58>": 32160,
163
+ "<y59>": 32161,
164
+ "<y5>": 32107,
165
+ "<y60>": 32162,
166
+ "<y61>": 32163,
167
+ "<y62>": 32164,
168
+ "<y63>": 32165,
169
+ "<y64>": 32166,
170
+ "<y65>": 32167,
171
+ "<y66>": 32168,
172
+ "<y67>": 32169,
173
+ "<y68>": 32170,
174
+ "<y69>": 32171,
175
+ "<y6>": 32108,
176
+ "<y70>": 32172,
177
+ "<y71>": 32173,
178
+ "<y72>": 32174,
179
+ "<y73>": 32175,
180
+ "<y74>": 32176,
181
+ "<y75>": 32177,
182
+ "<y76>": 32178,
183
+ "<y77>": 32179,
184
+ "<y78>": 32180,
185
+ "<y79>": 32181,
186
+ "<y7>": 32109,
187
+ "<y80>": 32182,
188
+ "<y81>": 32183,
189
+ "<y82>": 32184,
190
+ "<y83>": 32185,
191
+ "<y84>": 32186,
192
+ "<y85>": 32187,
193
+ "<y86>": 32188,
194
+ "<y87>": 32189,
195
+ "<y88>": 32190,
196
+ "<y89>": 32191,
197
+ "<y8>": 32110,
198
+ "<y90>": 32192,
199
+ "<y91>": 32193,
200
+ "<y92>": 32194,
201
+ "<y93>": 32195,
202
+ "<y94>": 32196,
203
+ "<y95>": 32197,
204
+ "<y96>": 32198,
205
+ "<y97>": 32199,
206
+ "<y98>": 32200,
207
+ "<y99>": 32201,
208
+ "<y9>": 32111
209
+ }
chat_template.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}You are an expert radiology assistant tasked with interpreting a chest X-ray study. {% for message in messages %}{% if message[\"role\"] == \"user\" %}USER: {% else %}ASSISTANT: {% endif %}{% for item in message[\"content\"] %}{% if item[\"type\"] == \"text\" %}{{ item[\"text\"] }}{% elif item[\"type\"] == \"image\" %}<image>{% endif %}{% endfor %}{% if message[\"role\"] == \"user\" %} {% else %}{{eos_token}}{% endif %}{% endfor %}{% if add_generation_prompt %}ASSISTANT: {% endif %}"
3
+ }
config.json ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Maira2ForConditionalGeneration"
4
+ ],
5
+ "auto_map": {
6
+ "AutoConfig": "configuration_maira2.Maira2Config",
7
+ "AutoModelForCausalLM": "modeling_maira2.Maira2ForConditionalGeneration",
8
+ "AutoModelForVision2Seq": "modeling_maira2.Maira2ForConditionalGeneration"
9
+ },
10
+ "hidden_size": 4096,
11
+ "image_seq_length": 576,
12
+ "image_token_index": 32204,
13
+ "model_type": "maira2",
14
+ "multimodal_projector_bias": true,
15
+ "pad_token_id": 0,
16
+ "projector_hidden_act": "gelu",
17
+ "projector_n_layers": 4,
18
+ "text_config": {
19
+ "_name_or_path": "lmsys/vicuna-7b-v1.5",
20
+ "architectures": [
21
+ "LlamaForCausalLM"
22
+ ],
23
+ "attention_bias": false,
24
+ "attention_dropout": 0.0,
25
+ "head_dim": 128,
26
+ "hidden_act": "silu",
27
+ "hidden_size": 4096,
28
+ "initializer_range": 0.02,
29
+ "intermediate_size": 11008,
30
+ "max_position_embeddings": 4096,
31
+ "mlp_bias": false,
32
+ "model_type": "llama",
33
+ "num_attention_heads": 32,
34
+ "num_hidden_layers": 32,
35
+ "num_key_value_heads": 32,
36
+ "pad_token_id": 0,
37
+ "pretraining_tp": 1,
38
+ "rms_norm_eps": 1e-05,
39
+ "rope_scaling": {
40
+ "factor": 1.5,
41
+ "rope_type": "linear"
42
+ },
43
+ "rope_theta": 10000.0,
44
+ "torch_dtype": "bfloat16",
45
+ "use_cache": true,
46
+ "vocab_size": 32207
47
+ },
48
+ "torch_dtype": "float32",
49
+ "transformers_version": "4.51.3",
50
+ "vision_config": {
51
+ "apply_layernorm": true,
52
+ "architectures": [
53
+ "Dinov2Model"
54
+ ],
55
+ "attention_probs_dropout_prob": 0.0,
56
+ "drop_path_rate": 0.0,
57
+ "hidden_act": "gelu",
58
+ "hidden_dropout_prob": 0.0,
59
+ "hidden_size": 768,
60
+ "image_size": 518,
61
+ "initializer_range": 0.02,
62
+ "layer_norm_eps": 1e-06,
63
+ "layerscale_value": 1.0,
64
+ "mlp_ratio": 4,
65
+ "model_type": "dinov2",
66
+ "num_attention_heads": 12,
67
+ "num_channels": 3,
68
+ "num_hidden_layers": 12,
69
+ "out_features": [
70
+ "stage12"
71
+ ],
72
+ "out_indices": [
73
+ 12
74
+ ],
75
+ "patch_size": 14,
76
+ "qkv_bias": true,
77
+ "reshape_hidden_states": false,
78
+ "stage_names": [
79
+ "stem",
80
+ "stage1",
81
+ "stage2",
82
+ "stage3",
83
+ "stage4",
84
+ "stage5",
85
+ "stage6",
86
+ "stage7",
87
+ "stage8",
88
+ "stage9",
89
+ "stage10",
90
+ "stage11",
91
+ "stage12"
92
+ ],
93
+ "torch_dtype": "float32",
94
+ "use_mask_token": true,
95
+ "use_swiglu_ffn": false
96
+ },
97
+ "vision_feature_layer": -1,
98
+ "vision_feature_select_strategy": "default"
99
+ }
configuration_maira2.py ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2024 Microsoft. All rights reserved.
2
+ # Licensed under the MSRLA License. See LICENSE in the repo root for license information.
3
+
4
+
5
+ from typing import Any
6
+
7
+ from transformers import LlavaConfig
8
+
9
+
10
+ class Maira2Config(LlavaConfig):
11
+ """
12
+ This is the configuration class to store the configuration of a `Maira2ForConditionalGeneration` model. It is
13
+ used to instantiate a MAIRA-2 model according to the specified arguments, defining the model architecture.
14
+
15
+ It inherits from `LlavaConfig`. In addition to the inherited attributes, it adds the
16
+ ability to customize the multimodal projector through following attributes:
17
+
18
+ Args:
19
+ projector_n_layers (`int`, *optional*, defaults to 4):
20
+ Number of layers in the multimodal projector.
21
+ """
22
+
23
+ model_type = "maira2"
24
+
25
+ def __init__(
26
+ self,
27
+ projector_n_layers: int = 4,
28
+ **kwargs: Any,
29
+ ) -> None:
30
+ super().__init__(**kwargs)
31
+ self.hidden_size = self.text_config.hidden_size
32
+ self.projector_n_layers = projector_n_layers
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "max_length": 4096,
6
+ "max_new_tokens": 450,
7
+ "pad_token_id": 0,
8
+ "transformers_version": "4.51.3"
9
+ }
model-00001-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0582f1d522390f92f3ebaaa5fa01d2e1a6b7f090f6aad33e32476f109128d767
3
+ size 135
model-00002-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e23f185c5830f812c171bf510c79a6b51a42f85e72da90d84b62d9433c2fbb1d
3
+ size 135
model-00003-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a2b6ae86f3a49f69e7f68915dd1a4edab79117dc5d8df7348e01175ac184f02d
3
+ size 135
model-00004-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f70554d96f021d607d4f443b92d987c3b9434a058350cdc9cde02fb2918c89b
3
+ size 135
model-00005-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9bf5d731c86abe7f7c5aef9be877cf15e6baa59c2f83b4b450a0e337a38f26b1
3
+ size 135
model-00006-of-00006.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca650a64438cbdc0a6e7f346fd9042446f2d625a85ecc7da96298888956b2a13
3
+ size 135
model.safetensors.index.json ADDED
@@ -0,0 +1,529 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 27520742400
4
+ },
5
+ "weight_map": {
6
+ "language_model.lm_head.weight": "model-00006-of-00006.safetensors",
7
+ "language_model.model.embed_tokens.weight": "model-00001-of-00006.safetensors",
8
+ "language_model.model.layers.0.input_layernorm.weight": "model-00001-of-00006.safetensors",
9
+ "language_model.model.layers.0.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
10
+ "language_model.model.layers.0.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
11
+ "language_model.model.layers.0.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
12
+ "language_model.model.layers.0.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
13
+ "language_model.model.layers.0.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
14
+ "language_model.model.layers.0.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
15
+ "language_model.model.layers.0.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
16
+ "language_model.model.layers.0.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
17
+ "language_model.model.layers.1.input_layernorm.weight": "model-00001-of-00006.safetensors",
18
+ "language_model.model.layers.1.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
19
+ "language_model.model.layers.1.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
20
+ "language_model.model.layers.1.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
21
+ "language_model.model.layers.1.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
22
+ "language_model.model.layers.1.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
23
+ "language_model.model.layers.1.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
24
+ "language_model.model.layers.1.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
25
+ "language_model.model.layers.1.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
26
+ "language_model.model.layers.10.input_layernorm.weight": "model-00003-of-00006.safetensors",
27
+ "language_model.model.layers.10.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
28
+ "language_model.model.layers.10.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
29
+ "language_model.model.layers.10.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
30
+ "language_model.model.layers.10.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
31
+ "language_model.model.layers.10.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
32
+ "language_model.model.layers.10.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
33
+ "language_model.model.layers.10.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
34
+ "language_model.model.layers.10.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
35
+ "language_model.model.layers.11.input_layernorm.weight": "model-00003-of-00006.safetensors",
36
+ "language_model.model.layers.11.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
37
+ "language_model.model.layers.11.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
38
+ "language_model.model.layers.11.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
39
+ "language_model.model.layers.11.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
40
+ "language_model.model.layers.11.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
41
+ "language_model.model.layers.11.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
42
+ "language_model.model.layers.11.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
43
+ "language_model.model.layers.11.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
44
+ "language_model.model.layers.12.input_layernorm.weight": "model-00003-of-00006.safetensors",
45
+ "language_model.model.layers.12.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
46
+ "language_model.model.layers.12.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
47
+ "language_model.model.layers.12.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
48
+ "language_model.model.layers.12.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
49
+ "language_model.model.layers.12.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
50
+ "language_model.model.layers.12.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
51
+ "language_model.model.layers.12.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
52
+ "language_model.model.layers.12.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
53
+ "language_model.model.layers.13.input_layernorm.weight": "model-00003-of-00006.safetensors",
54
+ "language_model.model.layers.13.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
55
+ "language_model.model.layers.13.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
56
+ "language_model.model.layers.13.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
57
+ "language_model.model.layers.13.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
58
+ "language_model.model.layers.13.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
59
+ "language_model.model.layers.13.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
60
+ "language_model.model.layers.13.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
61
+ "language_model.model.layers.13.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
62
+ "language_model.model.layers.14.input_layernorm.weight": "model-00003-of-00006.safetensors",
63
+ "language_model.model.layers.14.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
64
+ "language_model.model.layers.14.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
65
+ "language_model.model.layers.14.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
66
+ "language_model.model.layers.14.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
67
+ "language_model.model.layers.14.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
68
+ "language_model.model.layers.14.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
69
+ "language_model.model.layers.14.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
70
+ "language_model.model.layers.14.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
71
+ "language_model.model.layers.15.input_layernorm.weight": "model-00003-of-00006.safetensors",
72
+ "language_model.model.layers.15.mlp.down_proj.weight": "model-00003-of-00006.safetensors",
73
+ "language_model.model.layers.15.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
74
+ "language_model.model.layers.15.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
75
+ "language_model.model.layers.15.post_attention_layernorm.weight": "model-00003-of-00006.safetensors",
76
+ "language_model.model.layers.15.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
77
+ "language_model.model.layers.15.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
78
+ "language_model.model.layers.15.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
79
+ "language_model.model.layers.15.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
80
+ "language_model.model.layers.16.input_layernorm.weight": "model-00004-of-00006.safetensors",
81
+ "language_model.model.layers.16.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
82
+ "language_model.model.layers.16.mlp.gate_proj.weight": "model-00003-of-00006.safetensors",
83
+ "language_model.model.layers.16.mlp.up_proj.weight": "model-00003-of-00006.safetensors",
84
+ "language_model.model.layers.16.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
85
+ "language_model.model.layers.16.self_attn.k_proj.weight": "model-00003-of-00006.safetensors",
86
+ "language_model.model.layers.16.self_attn.o_proj.weight": "model-00003-of-00006.safetensors",
87
+ "language_model.model.layers.16.self_attn.q_proj.weight": "model-00003-of-00006.safetensors",
88
+ "language_model.model.layers.16.self_attn.v_proj.weight": "model-00003-of-00006.safetensors",
89
+ "language_model.model.layers.17.input_layernorm.weight": "model-00004-of-00006.safetensors",
90
+ "language_model.model.layers.17.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
91
+ "language_model.model.layers.17.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
92
+ "language_model.model.layers.17.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
93
+ "language_model.model.layers.17.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
94
+ "language_model.model.layers.17.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
95
+ "language_model.model.layers.17.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
96
+ "language_model.model.layers.17.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
97
+ "language_model.model.layers.17.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
98
+ "language_model.model.layers.18.input_layernorm.weight": "model-00004-of-00006.safetensors",
99
+ "language_model.model.layers.18.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
100
+ "language_model.model.layers.18.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
101
+ "language_model.model.layers.18.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
102
+ "language_model.model.layers.18.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
103
+ "language_model.model.layers.18.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
104
+ "language_model.model.layers.18.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
105
+ "language_model.model.layers.18.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
106
+ "language_model.model.layers.18.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
107
+ "language_model.model.layers.19.input_layernorm.weight": "model-00004-of-00006.safetensors",
108
+ "language_model.model.layers.19.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
109
+ "language_model.model.layers.19.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
110
+ "language_model.model.layers.19.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
111
+ "language_model.model.layers.19.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
112
+ "language_model.model.layers.19.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
113
+ "language_model.model.layers.19.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
114
+ "language_model.model.layers.19.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
115
+ "language_model.model.layers.19.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
116
+ "language_model.model.layers.2.input_layernorm.weight": "model-00001-of-00006.safetensors",
117
+ "language_model.model.layers.2.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
118
+ "language_model.model.layers.2.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
119
+ "language_model.model.layers.2.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
120
+ "language_model.model.layers.2.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
121
+ "language_model.model.layers.2.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
122
+ "language_model.model.layers.2.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
123
+ "language_model.model.layers.2.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
124
+ "language_model.model.layers.2.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
125
+ "language_model.model.layers.20.input_layernorm.weight": "model-00004-of-00006.safetensors",
126
+ "language_model.model.layers.20.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
127
+ "language_model.model.layers.20.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
128
+ "language_model.model.layers.20.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
129
+ "language_model.model.layers.20.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
130
+ "language_model.model.layers.20.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
131
+ "language_model.model.layers.20.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
132
+ "language_model.model.layers.20.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
133
+ "language_model.model.layers.20.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
134
+ "language_model.model.layers.21.input_layernorm.weight": "model-00004-of-00006.safetensors",
135
+ "language_model.model.layers.21.mlp.down_proj.weight": "model-00004-of-00006.safetensors",
136
+ "language_model.model.layers.21.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
137
+ "language_model.model.layers.21.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
138
+ "language_model.model.layers.21.post_attention_layernorm.weight": "model-00004-of-00006.safetensors",
139
+ "language_model.model.layers.21.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
140
+ "language_model.model.layers.21.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
141
+ "language_model.model.layers.21.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
142
+ "language_model.model.layers.21.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
143
+ "language_model.model.layers.22.input_layernorm.weight": "model-00005-of-00006.safetensors",
144
+ "language_model.model.layers.22.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
145
+ "language_model.model.layers.22.mlp.gate_proj.weight": "model-00004-of-00006.safetensors",
146
+ "language_model.model.layers.22.mlp.up_proj.weight": "model-00004-of-00006.safetensors",
147
+ "language_model.model.layers.22.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
148
+ "language_model.model.layers.22.self_attn.k_proj.weight": "model-00004-of-00006.safetensors",
149
+ "language_model.model.layers.22.self_attn.o_proj.weight": "model-00004-of-00006.safetensors",
150
+ "language_model.model.layers.22.self_attn.q_proj.weight": "model-00004-of-00006.safetensors",
151
+ "language_model.model.layers.22.self_attn.v_proj.weight": "model-00004-of-00006.safetensors",
152
+ "language_model.model.layers.23.input_layernorm.weight": "model-00005-of-00006.safetensors",
153
+ "language_model.model.layers.23.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
154
+ "language_model.model.layers.23.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
155
+ "language_model.model.layers.23.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
156
+ "language_model.model.layers.23.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
157
+ "language_model.model.layers.23.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
158
+ "language_model.model.layers.23.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
159
+ "language_model.model.layers.23.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
160
+ "language_model.model.layers.23.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
161
+ "language_model.model.layers.24.input_layernorm.weight": "model-00005-of-00006.safetensors",
162
+ "language_model.model.layers.24.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
163
+ "language_model.model.layers.24.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
164
+ "language_model.model.layers.24.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
165
+ "language_model.model.layers.24.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
166
+ "language_model.model.layers.24.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
167
+ "language_model.model.layers.24.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
168
+ "language_model.model.layers.24.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
169
+ "language_model.model.layers.24.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
170
+ "language_model.model.layers.25.input_layernorm.weight": "model-00005-of-00006.safetensors",
171
+ "language_model.model.layers.25.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
172
+ "language_model.model.layers.25.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
173
+ "language_model.model.layers.25.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
174
+ "language_model.model.layers.25.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
175
+ "language_model.model.layers.25.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
176
+ "language_model.model.layers.25.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
177
+ "language_model.model.layers.25.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
178
+ "language_model.model.layers.25.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
179
+ "language_model.model.layers.26.input_layernorm.weight": "model-00005-of-00006.safetensors",
180
+ "language_model.model.layers.26.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
181
+ "language_model.model.layers.26.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
182
+ "language_model.model.layers.26.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
183
+ "language_model.model.layers.26.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
184
+ "language_model.model.layers.26.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
185
+ "language_model.model.layers.26.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
186
+ "language_model.model.layers.26.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
187
+ "language_model.model.layers.26.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
188
+ "language_model.model.layers.27.input_layernorm.weight": "model-00005-of-00006.safetensors",
189
+ "language_model.model.layers.27.mlp.down_proj.weight": "model-00005-of-00006.safetensors",
190
+ "language_model.model.layers.27.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
191
+ "language_model.model.layers.27.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
192
+ "language_model.model.layers.27.post_attention_layernorm.weight": "model-00005-of-00006.safetensors",
193
+ "language_model.model.layers.27.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
194
+ "language_model.model.layers.27.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
195
+ "language_model.model.layers.27.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
196
+ "language_model.model.layers.27.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
197
+ "language_model.model.layers.28.input_layernorm.weight": "model-00006-of-00006.safetensors",
198
+ "language_model.model.layers.28.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
199
+ "language_model.model.layers.28.mlp.gate_proj.weight": "model-00005-of-00006.safetensors",
200
+ "language_model.model.layers.28.mlp.up_proj.weight": "model-00005-of-00006.safetensors",
201
+ "language_model.model.layers.28.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
202
+ "language_model.model.layers.28.self_attn.k_proj.weight": "model-00005-of-00006.safetensors",
203
+ "language_model.model.layers.28.self_attn.o_proj.weight": "model-00005-of-00006.safetensors",
204
+ "language_model.model.layers.28.self_attn.q_proj.weight": "model-00005-of-00006.safetensors",
205
+ "language_model.model.layers.28.self_attn.v_proj.weight": "model-00005-of-00006.safetensors",
206
+ "language_model.model.layers.29.input_layernorm.weight": "model-00006-of-00006.safetensors",
207
+ "language_model.model.layers.29.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
208
+ "language_model.model.layers.29.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
209
+ "language_model.model.layers.29.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
210
+ "language_model.model.layers.29.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
211
+ "language_model.model.layers.29.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
212
+ "language_model.model.layers.29.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
213
+ "language_model.model.layers.29.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
214
+ "language_model.model.layers.29.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
215
+ "language_model.model.layers.3.input_layernorm.weight": "model-00001-of-00006.safetensors",
216
+ "language_model.model.layers.3.mlp.down_proj.weight": "model-00001-of-00006.safetensors",
217
+ "language_model.model.layers.3.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
218
+ "language_model.model.layers.3.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
219
+ "language_model.model.layers.3.post_attention_layernorm.weight": "model-00001-of-00006.safetensors",
220
+ "language_model.model.layers.3.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
221
+ "language_model.model.layers.3.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
222
+ "language_model.model.layers.3.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
223
+ "language_model.model.layers.3.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
224
+ "language_model.model.layers.30.input_layernorm.weight": "model-00006-of-00006.safetensors",
225
+ "language_model.model.layers.30.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
226
+ "language_model.model.layers.30.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
227
+ "language_model.model.layers.30.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
228
+ "language_model.model.layers.30.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
229
+ "language_model.model.layers.30.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
230
+ "language_model.model.layers.30.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
231
+ "language_model.model.layers.30.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
232
+ "language_model.model.layers.30.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
233
+ "language_model.model.layers.31.input_layernorm.weight": "model-00006-of-00006.safetensors",
234
+ "language_model.model.layers.31.mlp.down_proj.weight": "model-00006-of-00006.safetensors",
235
+ "language_model.model.layers.31.mlp.gate_proj.weight": "model-00006-of-00006.safetensors",
236
+ "language_model.model.layers.31.mlp.up_proj.weight": "model-00006-of-00006.safetensors",
237
+ "language_model.model.layers.31.post_attention_layernorm.weight": "model-00006-of-00006.safetensors",
238
+ "language_model.model.layers.31.self_attn.k_proj.weight": "model-00006-of-00006.safetensors",
239
+ "language_model.model.layers.31.self_attn.o_proj.weight": "model-00006-of-00006.safetensors",
240
+ "language_model.model.layers.31.self_attn.q_proj.weight": "model-00006-of-00006.safetensors",
241
+ "language_model.model.layers.31.self_attn.v_proj.weight": "model-00006-of-00006.safetensors",
242
+ "language_model.model.layers.4.input_layernorm.weight": "model-00002-of-00006.safetensors",
243
+ "language_model.model.layers.4.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
244
+ "language_model.model.layers.4.mlp.gate_proj.weight": "model-00001-of-00006.safetensors",
245
+ "language_model.model.layers.4.mlp.up_proj.weight": "model-00001-of-00006.safetensors",
246
+ "language_model.model.layers.4.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
247
+ "language_model.model.layers.4.self_attn.k_proj.weight": "model-00001-of-00006.safetensors",
248
+ "language_model.model.layers.4.self_attn.o_proj.weight": "model-00001-of-00006.safetensors",
249
+ "language_model.model.layers.4.self_attn.q_proj.weight": "model-00001-of-00006.safetensors",
250
+ "language_model.model.layers.4.self_attn.v_proj.weight": "model-00001-of-00006.safetensors",
251
+ "language_model.model.layers.5.input_layernorm.weight": "model-00002-of-00006.safetensors",
252
+ "language_model.model.layers.5.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
253
+ "language_model.model.layers.5.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
254
+ "language_model.model.layers.5.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
255
+ "language_model.model.layers.5.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
256
+ "language_model.model.layers.5.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
257
+ "language_model.model.layers.5.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
258
+ "language_model.model.layers.5.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
259
+ "language_model.model.layers.5.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
260
+ "language_model.model.layers.6.input_layernorm.weight": "model-00002-of-00006.safetensors",
261
+ "language_model.model.layers.6.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
262
+ "language_model.model.layers.6.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
263
+ "language_model.model.layers.6.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
264
+ "language_model.model.layers.6.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
265
+ "language_model.model.layers.6.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
266
+ "language_model.model.layers.6.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
267
+ "language_model.model.layers.6.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
268
+ "language_model.model.layers.6.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
269
+ "language_model.model.layers.7.input_layernorm.weight": "model-00002-of-00006.safetensors",
270
+ "language_model.model.layers.7.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
271
+ "language_model.model.layers.7.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
272
+ "language_model.model.layers.7.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
273
+ "language_model.model.layers.7.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
274
+ "language_model.model.layers.7.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
275
+ "language_model.model.layers.7.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
276
+ "language_model.model.layers.7.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
277
+ "language_model.model.layers.7.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
278
+ "language_model.model.layers.8.input_layernorm.weight": "model-00002-of-00006.safetensors",
279
+ "language_model.model.layers.8.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
280
+ "language_model.model.layers.8.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
281
+ "language_model.model.layers.8.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
282
+ "language_model.model.layers.8.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
283
+ "language_model.model.layers.8.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
284
+ "language_model.model.layers.8.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
285
+ "language_model.model.layers.8.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
286
+ "language_model.model.layers.8.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
287
+ "language_model.model.layers.9.input_layernorm.weight": "model-00002-of-00006.safetensors",
288
+ "language_model.model.layers.9.mlp.down_proj.weight": "model-00002-of-00006.safetensors",
289
+ "language_model.model.layers.9.mlp.gate_proj.weight": "model-00002-of-00006.safetensors",
290
+ "language_model.model.layers.9.mlp.up_proj.weight": "model-00002-of-00006.safetensors",
291
+ "language_model.model.layers.9.post_attention_layernorm.weight": "model-00002-of-00006.safetensors",
292
+ "language_model.model.layers.9.self_attn.k_proj.weight": "model-00002-of-00006.safetensors",
293
+ "language_model.model.layers.9.self_attn.o_proj.weight": "model-00002-of-00006.safetensors",
294
+ "language_model.model.layers.9.self_attn.q_proj.weight": "model-00002-of-00006.safetensors",
295
+ "language_model.model.layers.9.self_attn.v_proj.weight": "model-00002-of-00006.safetensors",
296
+ "language_model.model.norm.weight": "model-00006-of-00006.safetensors",
297
+ "multi_modal_projector.layers.0.bias": "model-00001-of-00006.safetensors",
298
+ "multi_modal_projector.layers.0.weight": "model-00001-of-00006.safetensors",
299
+ "multi_modal_projector.layers.2.bias": "model-00001-of-00006.safetensors",
300
+ "multi_modal_projector.layers.2.weight": "model-00001-of-00006.safetensors",
301
+ "multi_modal_projector.layers.4.bias": "model-00001-of-00006.safetensors",
302
+ "multi_modal_projector.layers.4.weight": "model-00001-of-00006.safetensors",
303
+ "multi_modal_projector.layers.6.bias": "model-00001-of-00006.safetensors",
304
+ "multi_modal_projector.layers.6.weight": "model-00001-of-00006.safetensors",
305
+ "vision_tower.embeddings.cls_token": "model-00001-of-00006.safetensors",
306
+ "vision_tower.embeddings.mask_token": "model-00001-of-00006.safetensors",
307
+ "vision_tower.embeddings.patch_embeddings.projection.bias": "model-00001-of-00006.safetensors",
308
+ "vision_tower.embeddings.patch_embeddings.projection.weight": "model-00001-of-00006.safetensors",
309
+ "vision_tower.embeddings.position_embeddings": "model-00001-of-00006.safetensors",
310
+ "vision_tower.encoder.layer.0.attention.attention.key.bias": "model-00001-of-00006.safetensors",
311
+ "vision_tower.encoder.layer.0.attention.attention.key.weight": "model-00001-of-00006.safetensors",
312
+ "vision_tower.encoder.layer.0.attention.attention.query.bias": "model-00001-of-00006.safetensors",
313
+ "vision_tower.encoder.layer.0.attention.attention.query.weight": "model-00001-of-00006.safetensors",
314
+ "vision_tower.encoder.layer.0.attention.attention.value.bias": "model-00001-of-00006.safetensors",
315
+ "vision_tower.encoder.layer.0.attention.attention.value.weight": "model-00001-of-00006.safetensors",
316
+ "vision_tower.encoder.layer.0.attention.output.dense.bias": "model-00001-of-00006.safetensors",
317
+ "vision_tower.encoder.layer.0.attention.output.dense.weight": "model-00001-of-00006.safetensors",
318
+ "vision_tower.encoder.layer.0.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
319
+ "vision_tower.encoder.layer.0.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
320
+ "vision_tower.encoder.layer.0.mlp.fc1.bias": "model-00001-of-00006.safetensors",
321
+ "vision_tower.encoder.layer.0.mlp.fc1.weight": "model-00001-of-00006.safetensors",
322
+ "vision_tower.encoder.layer.0.mlp.fc2.bias": "model-00001-of-00006.safetensors",
323
+ "vision_tower.encoder.layer.0.mlp.fc2.weight": "model-00001-of-00006.safetensors",
324
+ "vision_tower.encoder.layer.0.norm1.bias": "model-00001-of-00006.safetensors",
325
+ "vision_tower.encoder.layer.0.norm1.weight": "model-00001-of-00006.safetensors",
326
+ "vision_tower.encoder.layer.0.norm2.bias": "model-00001-of-00006.safetensors",
327
+ "vision_tower.encoder.layer.0.norm2.weight": "model-00001-of-00006.safetensors",
328
+ "vision_tower.encoder.layer.1.attention.attention.key.bias": "model-00001-of-00006.safetensors",
329
+ "vision_tower.encoder.layer.1.attention.attention.key.weight": "model-00001-of-00006.safetensors",
330
+ "vision_tower.encoder.layer.1.attention.attention.query.bias": "model-00001-of-00006.safetensors",
331
+ "vision_tower.encoder.layer.1.attention.attention.query.weight": "model-00001-of-00006.safetensors",
332
+ "vision_tower.encoder.layer.1.attention.attention.value.bias": "model-00001-of-00006.safetensors",
333
+ "vision_tower.encoder.layer.1.attention.attention.value.weight": "model-00001-of-00006.safetensors",
334
+ "vision_tower.encoder.layer.1.attention.output.dense.bias": "model-00001-of-00006.safetensors",
335
+ "vision_tower.encoder.layer.1.attention.output.dense.weight": "model-00001-of-00006.safetensors",
336
+ "vision_tower.encoder.layer.1.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
337
+ "vision_tower.encoder.layer.1.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
338
+ "vision_tower.encoder.layer.1.mlp.fc1.bias": "model-00001-of-00006.safetensors",
339
+ "vision_tower.encoder.layer.1.mlp.fc1.weight": "model-00001-of-00006.safetensors",
340
+ "vision_tower.encoder.layer.1.mlp.fc2.bias": "model-00001-of-00006.safetensors",
341
+ "vision_tower.encoder.layer.1.mlp.fc2.weight": "model-00001-of-00006.safetensors",
342
+ "vision_tower.encoder.layer.1.norm1.bias": "model-00001-of-00006.safetensors",
343
+ "vision_tower.encoder.layer.1.norm1.weight": "model-00001-of-00006.safetensors",
344
+ "vision_tower.encoder.layer.1.norm2.bias": "model-00001-of-00006.safetensors",
345
+ "vision_tower.encoder.layer.1.norm2.weight": "model-00001-of-00006.safetensors",
346
+ "vision_tower.encoder.layer.10.attention.attention.key.bias": "model-00001-of-00006.safetensors",
347
+ "vision_tower.encoder.layer.10.attention.attention.key.weight": "model-00001-of-00006.safetensors",
348
+ "vision_tower.encoder.layer.10.attention.attention.query.bias": "model-00001-of-00006.safetensors",
349
+ "vision_tower.encoder.layer.10.attention.attention.query.weight": "model-00001-of-00006.safetensors",
350
+ "vision_tower.encoder.layer.10.attention.attention.value.bias": "model-00001-of-00006.safetensors",
351
+ "vision_tower.encoder.layer.10.attention.attention.value.weight": "model-00001-of-00006.safetensors",
352
+ "vision_tower.encoder.layer.10.attention.output.dense.bias": "model-00001-of-00006.safetensors",
353
+ "vision_tower.encoder.layer.10.attention.output.dense.weight": "model-00001-of-00006.safetensors",
354
+ "vision_tower.encoder.layer.10.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
355
+ "vision_tower.encoder.layer.10.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
356
+ "vision_tower.encoder.layer.10.mlp.fc1.bias": "model-00001-of-00006.safetensors",
357
+ "vision_tower.encoder.layer.10.mlp.fc1.weight": "model-00001-of-00006.safetensors",
358
+ "vision_tower.encoder.layer.10.mlp.fc2.bias": "model-00001-of-00006.safetensors",
359
+ "vision_tower.encoder.layer.10.mlp.fc2.weight": "model-00001-of-00006.safetensors",
360
+ "vision_tower.encoder.layer.10.norm1.bias": "model-00001-of-00006.safetensors",
361
+ "vision_tower.encoder.layer.10.norm1.weight": "model-00001-of-00006.safetensors",
362
+ "vision_tower.encoder.layer.10.norm2.bias": "model-00001-of-00006.safetensors",
363
+ "vision_tower.encoder.layer.10.norm2.weight": "model-00001-of-00006.safetensors",
364
+ "vision_tower.encoder.layer.11.attention.attention.key.bias": "model-00001-of-00006.safetensors",
365
+ "vision_tower.encoder.layer.11.attention.attention.key.weight": "model-00001-of-00006.safetensors",
366
+ "vision_tower.encoder.layer.11.attention.attention.query.bias": "model-00001-of-00006.safetensors",
367
+ "vision_tower.encoder.layer.11.attention.attention.query.weight": "model-00001-of-00006.safetensors",
368
+ "vision_tower.encoder.layer.11.attention.attention.value.bias": "model-00001-of-00006.safetensors",
369
+ "vision_tower.encoder.layer.11.attention.attention.value.weight": "model-00001-of-00006.safetensors",
370
+ "vision_tower.encoder.layer.11.attention.output.dense.bias": "model-00001-of-00006.safetensors",
371
+ "vision_tower.encoder.layer.11.attention.output.dense.weight": "model-00001-of-00006.safetensors",
372
+ "vision_tower.encoder.layer.11.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
373
+ "vision_tower.encoder.layer.11.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
374
+ "vision_tower.encoder.layer.11.mlp.fc1.bias": "model-00001-of-00006.safetensors",
375
+ "vision_tower.encoder.layer.11.mlp.fc1.weight": "model-00001-of-00006.safetensors",
376
+ "vision_tower.encoder.layer.11.mlp.fc2.bias": "model-00001-of-00006.safetensors",
377
+ "vision_tower.encoder.layer.11.mlp.fc2.weight": "model-00001-of-00006.safetensors",
378
+ "vision_tower.encoder.layer.11.norm1.bias": "model-00001-of-00006.safetensors",
379
+ "vision_tower.encoder.layer.11.norm1.weight": "model-00001-of-00006.safetensors",
380
+ "vision_tower.encoder.layer.11.norm2.bias": "model-00001-of-00006.safetensors",
381
+ "vision_tower.encoder.layer.11.norm2.weight": "model-00001-of-00006.safetensors",
382
+ "vision_tower.encoder.layer.2.attention.attention.key.bias": "model-00001-of-00006.safetensors",
383
+ "vision_tower.encoder.layer.2.attention.attention.key.weight": "model-00001-of-00006.safetensors",
384
+ "vision_tower.encoder.layer.2.attention.attention.query.bias": "model-00001-of-00006.safetensors",
385
+ "vision_tower.encoder.layer.2.attention.attention.query.weight": "model-00001-of-00006.safetensors",
386
+ "vision_tower.encoder.layer.2.attention.attention.value.bias": "model-00001-of-00006.safetensors",
387
+ "vision_tower.encoder.layer.2.attention.attention.value.weight": "model-00001-of-00006.safetensors",
388
+ "vision_tower.encoder.layer.2.attention.output.dense.bias": "model-00001-of-00006.safetensors",
389
+ "vision_tower.encoder.layer.2.attention.output.dense.weight": "model-00001-of-00006.safetensors",
390
+ "vision_tower.encoder.layer.2.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
391
+ "vision_tower.encoder.layer.2.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
392
+ "vision_tower.encoder.layer.2.mlp.fc1.bias": "model-00001-of-00006.safetensors",
393
+ "vision_tower.encoder.layer.2.mlp.fc1.weight": "model-00001-of-00006.safetensors",
394
+ "vision_tower.encoder.layer.2.mlp.fc2.bias": "model-00001-of-00006.safetensors",
395
+ "vision_tower.encoder.layer.2.mlp.fc2.weight": "model-00001-of-00006.safetensors",
396
+ "vision_tower.encoder.layer.2.norm1.bias": "model-00001-of-00006.safetensors",
397
+ "vision_tower.encoder.layer.2.norm1.weight": "model-00001-of-00006.safetensors",
398
+ "vision_tower.encoder.layer.2.norm2.bias": "model-00001-of-00006.safetensors",
399
+ "vision_tower.encoder.layer.2.norm2.weight": "model-00001-of-00006.safetensors",
400
+ "vision_tower.encoder.layer.3.attention.attention.key.bias": "model-00001-of-00006.safetensors",
401
+ "vision_tower.encoder.layer.3.attention.attention.key.weight": "model-00001-of-00006.safetensors",
402
+ "vision_tower.encoder.layer.3.attention.attention.query.bias": "model-00001-of-00006.safetensors",
403
+ "vision_tower.encoder.layer.3.attention.attention.query.weight": "model-00001-of-00006.safetensors",
404
+ "vision_tower.encoder.layer.3.attention.attention.value.bias": "model-00001-of-00006.safetensors",
405
+ "vision_tower.encoder.layer.3.attention.attention.value.weight": "model-00001-of-00006.safetensors",
406
+ "vision_tower.encoder.layer.3.attention.output.dense.bias": "model-00001-of-00006.safetensors",
407
+ "vision_tower.encoder.layer.3.attention.output.dense.weight": "model-00001-of-00006.safetensors",
408
+ "vision_tower.encoder.layer.3.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
409
+ "vision_tower.encoder.layer.3.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
410
+ "vision_tower.encoder.layer.3.mlp.fc1.bias": "model-00001-of-00006.safetensors",
411
+ "vision_tower.encoder.layer.3.mlp.fc1.weight": "model-00001-of-00006.safetensors",
412
+ "vision_tower.encoder.layer.3.mlp.fc2.bias": "model-00001-of-00006.safetensors",
413
+ "vision_tower.encoder.layer.3.mlp.fc2.weight": "model-00001-of-00006.safetensors",
414
+ "vision_tower.encoder.layer.3.norm1.bias": "model-00001-of-00006.safetensors",
415
+ "vision_tower.encoder.layer.3.norm1.weight": "model-00001-of-00006.safetensors",
416
+ "vision_tower.encoder.layer.3.norm2.bias": "model-00001-of-00006.safetensors",
417
+ "vision_tower.encoder.layer.3.norm2.weight": "model-00001-of-00006.safetensors",
418
+ "vision_tower.encoder.layer.4.attention.attention.key.bias": "model-00001-of-00006.safetensors",
419
+ "vision_tower.encoder.layer.4.attention.attention.key.weight": "model-00001-of-00006.safetensors",
420
+ "vision_tower.encoder.layer.4.attention.attention.query.bias": "model-00001-of-00006.safetensors",
421
+ "vision_tower.encoder.layer.4.attention.attention.query.weight": "model-00001-of-00006.safetensors",
422
+ "vision_tower.encoder.layer.4.attention.attention.value.bias": "model-00001-of-00006.safetensors",
423
+ "vision_tower.encoder.layer.4.attention.attention.value.weight": "model-00001-of-00006.safetensors",
424
+ "vision_tower.encoder.layer.4.attention.output.dense.bias": "model-00001-of-00006.safetensors",
425
+ "vision_tower.encoder.layer.4.attention.output.dense.weight": "model-00001-of-00006.safetensors",
426
+ "vision_tower.encoder.layer.4.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
427
+ "vision_tower.encoder.layer.4.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
428
+ "vision_tower.encoder.layer.4.mlp.fc1.bias": "model-00001-of-00006.safetensors",
429
+ "vision_tower.encoder.layer.4.mlp.fc1.weight": "model-00001-of-00006.safetensors",
430
+ "vision_tower.encoder.layer.4.mlp.fc2.bias": "model-00001-of-00006.safetensors",
431
+ "vision_tower.encoder.layer.4.mlp.fc2.weight": "model-00001-of-00006.safetensors",
432
+ "vision_tower.encoder.layer.4.norm1.bias": "model-00001-of-00006.safetensors",
433
+ "vision_tower.encoder.layer.4.norm1.weight": "model-00001-of-00006.safetensors",
434
+ "vision_tower.encoder.layer.4.norm2.bias": "model-00001-of-00006.safetensors",
435
+ "vision_tower.encoder.layer.4.norm2.weight": "model-00001-of-00006.safetensors",
436
+ "vision_tower.encoder.layer.5.attention.attention.key.bias": "model-00001-of-00006.safetensors",
437
+ "vision_tower.encoder.layer.5.attention.attention.key.weight": "model-00001-of-00006.safetensors",
438
+ "vision_tower.encoder.layer.5.attention.attention.query.bias": "model-00001-of-00006.safetensors",
439
+ "vision_tower.encoder.layer.5.attention.attention.query.weight": "model-00001-of-00006.safetensors",
440
+ "vision_tower.encoder.layer.5.attention.attention.value.bias": "model-00001-of-00006.safetensors",
441
+ "vision_tower.encoder.layer.5.attention.attention.value.weight": "model-00001-of-00006.safetensors",
442
+ "vision_tower.encoder.layer.5.attention.output.dense.bias": "model-00001-of-00006.safetensors",
443
+ "vision_tower.encoder.layer.5.attention.output.dense.weight": "model-00001-of-00006.safetensors",
444
+ "vision_tower.encoder.layer.5.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
445
+ "vision_tower.encoder.layer.5.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
446
+ "vision_tower.encoder.layer.5.mlp.fc1.bias": "model-00001-of-00006.safetensors",
447
+ "vision_tower.encoder.layer.5.mlp.fc1.weight": "model-00001-of-00006.safetensors",
448
+ "vision_tower.encoder.layer.5.mlp.fc2.bias": "model-00001-of-00006.safetensors",
449
+ "vision_tower.encoder.layer.5.mlp.fc2.weight": "model-00001-of-00006.safetensors",
450
+ "vision_tower.encoder.layer.5.norm1.bias": "model-00001-of-00006.safetensors",
451
+ "vision_tower.encoder.layer.5.norm1.weight": "model-00001-of-00006.safetensors",
452
+ "vision_tower.encoder.layer.5.norm2.bias": "model-00001-of-00006.safetensors",
453
+ "vision_tower.encoder.layer.5.norm2.weight": "model-00001-of-00006.safetensors",
454
+ "vision_tower.encoder.layer.6.attention.attention.key.bias": "model-00001-of-00006.safetensors",
455
+ "vision_tower.encoder.layer.6.attention.attention.key.weight": "model-00001-of-00006.safetensors",
456
+ "vision_tower.encoder.layer.6.attention.attention.query.bias": "model-00001-of-00006.safetensors",
457
+ "vision_tower.encoder.layer.6.attention.attention.query.weight": "model-00001-of-00006.safetensors",
458
+ "vision_tower.encoder.layer.6.attention.attention.value.bias": "model-00001-of-00006.safetensors",
459
+ "vision_tower.encoder.layer.6.attention.attention.value.weight": "model-00001-of-00006.safetensors",
460
+ "vision_tower.encoder.layer.6.attention.output.dense.bias": "model-00001-of-00006.safetensors",
461
+ "vision_tower.encoder.layer.6.attention.output.dense.weight": "model-00001-of-00006.safetensors",
462
+ "vision_tower.encoder.layer.6.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
463
+ "vision_tower.encoder.layer.6.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
464
+ "vision_tower.encoder.layer.6.mlp.fc1.bias": "model-00001-of-00006.safetensors",
465
+ "vision_tower.encoder.layer.6.mlp.fc1.weight": "model-00001-of-00006.safetensors",
466
+ "vision_tower.encoder.layer.6.mlp.fc2.bias": "model-00001-of-00006.safetensors",
467
+ "vision_tower.encoder.layer.6.mlp.fc2.weight": "model-00001-of-00006.safetensors",
468
+ "vision_tower.encoder.layer.6.norm1.bias": "model-00001-of-00006.safetensors",
469
+ "vision_tower.encoder.layer.6.norm1.weight": "model-00001-of-00006.safetensors",
470
+ "vision_tower.encoder.layer.6.norm2.bias": "model-00001-of-00006.safetensors",
471
+ "vision_tower.encoder.layer.6.norm2.weight": "model-00001-of-00006.safetensors",
472
+ "vision_tower.encoder.layer.7.attention.attention.key.bias": "model-00001-of-00006.safetensors",
473
+ "vision_tower.encoder.layer.7.attention.attention.key.weight": "model-00001-of-00006.safetensors",
474
+ "vision_tower.encoder.layer.7.attention.attention.query.bias": "model-00001-of-00006.safetensors",
475
+ "vision_tower.encoder.layer.7.attention.attention.query.weight": "model-00001-of-00006.safetensors",
476
+ "vision_tower.encoder.layer.7.attention.attention.value.bias": "model-00001-of-00006.safetensors",
477
+ "vision_tower.encoder.layer.7.attention.attention.value.weight": "model-00001-of-00006.safetensors",
478
+ "vision_tower.encoder.layer.7.attention.output.dense.bias": "model-00001-of-00006.safetensors",
479
+ "vision_tower.encoder.layer.7.attention.output.dense.weight": "model-00001-of-00006.safetensors",
480
+ "vision_tower.encoder.layer.7.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
481
+ "vision_tower.encoder.layer.7.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
482
+ "vision_tower.encoder.layer.7.mlp.fc1.bias": "model-00001-of-00006.safetensors",
483
+ "vision_tower.encoder.layer.7.mlp.fc1.weight": "model-00001-of-00006.safetensors",
484
+ "vision_tower.encoder.layer.7.mlp.fc2.bias": "model-00001-of-00006.safetensors",
485
+ "vision_tower.encoder.layer.7.mlp.fc2.weight": "model-00001-of-00006.safetensors",
486
+ "vision_tower.encoder.layer.7.norm1.bias": "model-00001-of-00006.safetensors",
487
+ "vision_tower.encoder.layer.7.norm1.weight": "model-00001-of-00006.safetensors",
488
+ "vision_tower.encoder.layer.7.norm2.bias": "model-00001-of-00006.safetensors",
489
+ "vision_tower.encoder.layer.7.norm2.weight": "model-00001-of-00006.safetensors",
490
+ "vision_tower.encoder.layer.8.attention.attention.key.bias": "model-00001-of-00006.safetensors",
491
+ "vision_tower.encoder.layer.8.attention.attention.key.weight": "model-00001-of-00006.safetensors",
492
+ "vision_tower.encoder.layer.8.attention.attention.query.bias": "model-00001-of-00006.safetensors",
493
+ "vision_tower.encoder.layer.8.attention.attention.query.weight": "model-00001-of-00006.safetensors",
494
+ "vision_tower.encoder.layer.8.attention.attention.value.bias": "model-00001-of-00006.safetensors",
495
+ "vision_tower.encoder.layer.8.attention.attention.value.weight": "model-00001-of-00006.safetensors",
496
+ "vision_tower.encoder.layer.8.attention.output.dense.bias": "model-00001-of-00006.safetensors",
497
+ "vision_tower.encoder.layer.8.attention.output.dense.weight": "model-00001-of-00006.safetensors",
498
+ "vision_tower.encoder.layer.8.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
499
+ "vision_tower.encoder.layer.8.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
500
+ "vision_tower.encoder.layer.8.mlp.fc1.bias": "model-00001-of-00006.safetensors",
501
+ "vision_tower.encoder.layer.8.mlp.fc1.weight": "model-00001-of-00006.safetensors",
502
+ "vision_tower.encoder.layer.8.mlp.fc2.bias": "model-00001-of-00006.safetensors",
503
+ "vision_tower.encoder.layer.8.mlp.fc2.weight": "model-00001-of-00006.safetensors",
504
+ "vision_tower.encoder.layer.8.norm1.bias": "model-00001-of-00006.safetensors",
505
+ "vision_tower.encoder.layer.8.norm1.weight": "model-00001-of-00006.safetensors",
506
+ "vision_tower.encoder.layer.8.norm2.bias": "model-00001-of-00006.safetensors",
507
+ "vision_tower.encoder.layer.8.norm2.weight": "model-00001-of-00006.safetensors",
508
+ "vision_tower.encoder.layer.9.attention.attention.key.bias": "model-00001-of-00006.safetensors",
509
+ "vision_tower.encoder.layer.9.attention.attention.key.weight": "model-00001-of-00006.safetensors",
510
+ "vision_tower.encoder.layer.9.attention.attention.query.bias": "model-00001-of-00006.safetensors",
511
+ "vision_tower.encoder.layer.9.attention.attention.query.weight": "model-00001-of-00006.safetensors",
512
+ "vision_tower.encoder.layer.9.attention.attention.value.bias": "model-00001-of-00006.safetensors",
513
+ "vision_tower.encoder.layer.9.attention.attention.value.weight": "model-00001-of-00006.safetensors",
514
+ "vision_tower.encoder.layer.9.attention.output.dense.bias": "model-00001-of-00006.safetensors",
515
+ "vision_tower.encoder.layer.9.attention.output.dense.weight": "model-00001-of-00006.safetensors",
516
+ "vision_tower.encoder.layer.9.layer_scale1.lambda1": "model-00001-of-00006.safetensors",
517
+ "vision_tower.encoder.layer.9.layer_scale2.lambda1": "model-00001-of-00006.safetensors",
518
+ "vision_tower.encoder.layer.9.mlp.fc1.bias": "model-00001-of-00006.safetensors",
519
+ "vision_tower.encoder.layer.9.mlp.fc1.weight": "model-00001-of-00006.safetensors",
520
+ "vision_tower.encoder.layer.9.mlp.fc2.bias": "model-00001-of-00006.safetensors",
521
+ "vision_tower.encoder.layer.9.mlp.fc2.weight": "model-00001-of-00006.safetensors",
522
+ "vision_tower.encoder.layer.9.norm1.bias": "model-00001-of-00006.safetensors",
523
+ "vision_tower.encoder.layer.9.norm1.weight": "model-00001-of-00006.safetensors",
524
+ "vision_tower.encoder.layer.9.norm2.bias": "model-00001-of-00006.safetensors",
525
+ "vision_tower.encoder.layer.9.norm2.weight": "model-00001-of-00006.safetensors",
526
+ "vision_tower.layernorm.bias": "model-00001-of-00006.safetensors",
527
+ "vision_tower.layernorm.weight": "model-00001-of-00006.safetensors"
528
+ }
529
+ }
modeling_maira2.py ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2024 Microsoft. All rights reserved.
2
+ # Licensed under the MSRLA License. See LICENSE in the repo root for license information.
3
+
4
+
5
+ from typing import Any
6
+
7
+ import torch
8
+ from torch.nn import Linear, Module, Sequential
9
+ from transformers import (
10
+ AutoBackbone,
11
+ AutoModelForCausalLM,
12
+ LlavaForConditionalGeneration,
13
+ LlavaPreTrainedModel,
14
+ )
15
+ from transformers.activations import ACT2FN
16
+ from transformers.utils import check_min_version
17
+
18
+ from .configuration_maira2 import Maira2Config
19
+
20
+
21
+ class Maira2MultiModalProjector(Module):
22
+ """
23
+ This class implements the multimodal projector for MAIRA-2 model. It projects the image features to the text
24
+ hidden size via a series of linear layers (4 layers in MAIRA-2).
25
+ """
26
+
27
+ def __init__(self, config: Maira2Config):
28
+ super().__init__()
29
+
30
+ n_layers = config.projector_n_layers
31
+ if n_layers < 1:
32
+ raise ValueError(f"Number of layers should be at least 1, got {n_layers=}")
33
+ text_hidden_size = config.text_config.hidden_size
34
+ vision_hidden_size = config.vision_config.hidden_size
35
+ _layers = [Linear(vision_hidden_size, text_hidden_size, bias=True)]
36
+ for _ in range(n_layers - 1):
37
+ _layers.append(ACT2FN[config.projector_hidden_act])
38
+ _layers.append(Linear(text_hidden_size, text_hidden_size, bias=True))
39
+
40
+ self.layers = Sequential(*_layers)
41
+
42
+ def forward(self, image_features: torch.Tensor) -> torch.FloatTensor:
43
+ hidden_states = self.layers(image_features)
44
+ return hidden_states # type: ignore[no-any-return]
45
+
46
+
47
+ class Maira2ForConditionalGeneration(LlavaForConditionalGeneration):
48
+ """
49
+ This model implements the multimodal model MAIRA-2. It consists of a vision backbone, a multimodal projector, and a
50
+ language model. The model can be used for grounded and ungrounded report generation tasks as well as phrase grounding.
51
+ This class inherits from `LlavaForConditionalGeneration`, defining a custom multimodal projector and changing image
52
+ feature selection.
53
+ """
54
+
55
+ config_class = Maira2Config
56
+
57
+ def __init__(self, config: Maira2Config) -> None:
58
+ # Check transformers version is at least 4.46.0.dev0 otherwise the model fails
59
+ # silently since get_image_features is not called in the forward pass
60
+ check_min_version("4.46.0.dev0")
61
+
62
+ super(LlavaPreTrainedModel, self).__init__(config)
63
+ self.vision_tower = AutoBackbone.from_config(config.vision_config)
64
+
65
+ self.multi_modal_projector = Maira2MultiModalProjector(config)
66
+ self.vocab_size = config.text_config.vocab_size
67
+ self.language_model = AutoModelForCausalLM.from_config(
68
+ config.text_config,
69
+ attn_implementation=config._attn_implementation,
70
+ )
71
+ self.pad_token_id = (
72
+ self.config.pad_token_id if self.config.pad_token_id is not None else -1
73
+ )
74
+ self.post_init()
75
+
76
+ def get_image_features(
77
+ self,
78
+ pixel_values: torch.FloatTensor,
79
+ vision_feature_layer: int | list[int],
80
+ vision_feature_select_strategy: str,
81
+ **kwargs: Any,
82
+ ) -> torch.Tensor:
83
+ """
84
+ This method extracts the image features from the vision backbone using the specified feature layer and
85
+ selection strategy. This is custom to MAIRA-2 model since we want to use the `feature_maps` from the Dinov2Backbone
86
+ class instead of the `hidden_states` which are used in the default implementation of `get_image_features` in LlavaForConditionalGeneration.
87
+ The feature_maps returned by Dinov2Backbone are the hideen_states with a layernorm applied to them.
88
+ """
89
+ if isinstance(vision_feature_layer, list):
90
+ raise ValueError(
91
+ "MAIRA-2 does not support list values for vision_feature_layer."
92
+ )
93
+
94
+ if vision_feature_select_strategy not in ["default", "full"]:
95
+ raise ValueError(
96
+ f"Unexpected select feature strategy: {self.config.vision_feature_select_strategy}"
97
+ )
98
+
99
+ extra_kwargs = {k: v for k, v in kwargs.items() if v is not None}
100
+ if extra_kwargs:
101
+ raise ValueError(
102
+ f"MAIRA-2 does not support passing extra kwargs to the vision tower, received: {extra_kwargs}"
103
+ )
104
+ image_outputs = self.vision_tower(pixel_values, output_hidden_states=True)
105
+
106
+ selected_image_feature = image_outputs.feature_maps[vision_feature_layer]
107
+
108
+ if vision_feature_select_strategy == "default":
109
+ selected_image_feature = selected_image_feature[:, 1:]
110
+
111
+ image_features = self.multi_modal_projector(selected_image_feature)
112
+ return image_features # type: ignore[no-any-return]
preprocessor_config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoProcessor": "processing_maira2.Maira2Processor"
4
+ },
5
+ "crop_size": {
6
+ "height": 518,
7
+ "width": 518
8
+ },
9
+ "do_center_crop": true,
10
+ "do_convert_rgb": true,
11
+ "do_normalize": true,
12
+ "do_rescale": true,
13
+ "do_resize": true,
14
+ "image_mean": [
15
+ 0.5307,
16
+ 0.5307,
17
+ 0.5307
18
+ ],
19
+ "image_processor_type": "BitImageProcessor",
20
+ "image_std": [
21
+ 0.2583,
22
+ 0.2583,
23
+ 0.2583
24
+ ],
25
+ "processor_class": "Maira2Processor",
26
+ "resample": 3,
27
+ "rescale_factor": 0.00392156862745098,
28
+ "size": {
29
+ "shortest_edge": 518
30
+ }
31
+ }
processing_maira2.py ADDED
@@ -0,0 +1,649 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2024 Microsoft. All rights reserved.
2
+ # Licensed under the MSRLA License. See LICENSE in the repo root for license information.
3
+
4
+
5
+ import re
6
+ from typing import Any, TypeAlias
7
+
8
+ import numpy as np
9
+ from PIL import Image
10
+ from transformers import BaseImageProcessor, LlavaProcessor, PreTrainedTokenizer
11
+ from transformers.feature_extraction_utils import BatchFeature
12
+
13
+ SingleChatMessageType: TypeAlias = dict[str, str | int | None]
14
+ ChatMessageListType: TypeAlias = list[dict[str, str | list[SingleChatMessageType]]]
15
+ BoxType: TypeAlias = tuple[float, float, float, float]
16
+
17
+
18
+ class Maira2Processor(LlavaProcessor):
19
+ """
20
+ Constructs a Maira2 processor similar to LlavaProcessor but with additional arguments and functions to support
21
+ multi-image grounded and non-grounded radiology report generation.
22
+
23
+ In addition to the arguments of LlavaProcessor, Maira2Processor has the following extra arguments:
24
+
25
+ Args:
26
+ phrase_start_token (`str`, *optional*, defaults to `"<obj>"`):
27
+ Special token used to denote the start of a grounded phrase (with or without box).
28
+ phrase_end_token (`str`, *optional*, defaults to `"</obj>"`):
29
+ Special token used to denote the end of a grounded phrase.
30
+ box_start_token (`str`, *optional*, defaults to `"<box>"`):
31
+ Special token used to denote the start of a bounding box.
32
+ box_end_token (`str`, *optional*, defaults to `"</box>"`):
33
+ Special token used to denote the end of a bounding box.
34
+ num_box_coord_bins (`int`, *optional*, defaults to `100`):
35
+ Number of bins used to represent the bounding box coordinates.
36
+ """
37
+
38
+ valid_kwargs = [
39
+ "chat_template",
40
+ "patch_size",
41
+ "vision_feature_select_strategy",
42
+ "image_token",
43
+ "num_additional_image_tokens",
44
+ "phrase_start_token",
45
+ "phrase_end_token",
46
+ "box_start_token",
47
+ "box_end_token",
48
+ "num_box_coord_bins",
49
+ ]
50
+
51
+ def __init__(
52
+ self,
53
+ image_processor: BaseImageProcessor = None,
54
+ tokenizer: PreTrainedTokenizer = None,
55
+ patch_size: int | None = None,
56
+ vision_feature_select_strategy: str | None = None,
57
+ chat_template: str | None = None,
58
+ image_token: str = "<image>",
59
+ num_additional_image_tokens: int = 1,
60
+ phrase_start_token: str = "<obj>",
61
+ phrase_end_token: str = "</obj>",
62
+ box_start_token: str = "<box>",
63
+ box_end_token: str = "</box>",
64
+ num_box_coord_bins: int = 100,
65
+ **kwargs: Any,
66
+ ) -> None:
67
+ super().__init__(
68
+ image_processor=image_processor,
69
+ tokenizer=tokenizer,
70
+ patch_size=patch_size,
71
+ vision_feature_select_strategy=vision_feature_select_strategy,
72
+ chat_template=chat_template,
73
+ image_token=image_token,
74
+ num_additional_image_tokens=num_additional_image_tokens,
75
+ **kwargs,
76
+ )
77
+
78
+ self.phrase_start_token = phrase_start_token
79
+ self.phrase_end_token = phrase_end_token
80
+ self.box_start_token = box_start_token
81
+ self.box_end_token = box_end_token
82
+ self.num_box_coord_bins = num_box_coord_bins
83
+
84
+ @staticmethod
85
+ def _normalize_image(image: Image.Image) -> Image.Image:
86
+ """
87
+ This function normalizes the input image to have pixel values in the range [0, 255].
88
+
89
+ Args:
90
+ image (Image.Image | np.ndarray):
91
+ The input image to be normalized.
92
+
93
+ Returns:
94
+ Image.Image: The normalized image in grayscale.
95
+ """
96
+ image_np = np.array(image.convert("L"))
97
+ image_np = image_np.astype(float)
98
+ image_np -= image_np.min()
99
+ image_np /= image_np.max()
100
+ image_np *= 255
101
+ image_np = image_np.astype(np.uint8)
102
+
103
+ return Image.fromarray(image_np).convert("L")
104
+
105
+ def _normalize_and_stack_images(
106
+ self,
107
+ current_frontal: Image.Image,
108
+ current_lateral: Image.Image | None,
109
+ prior_frontal: Image.Image | None,
110
+ ) -> list[Image.Image]:
111
+ """
112
+ This function normalizes the input images and stacks them together. The images are stacked in the order of
113
+ current_frontal, current_lateral, and prior_frontal. The order of images is important, since it must match the
114
+ order of the images in the prompt, which is frontal, then lateral then prior.
115
+
116
+ Args:
117
+ current_frontal (Image.Image):
118
+ The current frontal image.
119
+ current_lateral (Image.Image | None):
120
+ The current lateral image.
121
+ prior_frontal (Image.Image | None):
122
+ The prior frontal image.
123
+
124
+ Returns:
125
+ list[Image.Image]: The normalized images stacked together.
126
+ """
127
+ images = [self._normalize_image(current_frontal)]
128
+ if current_lateral is not None:
129
+ images.append(self._normalize_image(current_lateral))
130
+ if prior_frontal is not None:
131
+ images.append(self._normalize_image(prior_frontal))
132
+ return images
133
+
134
+ @staticmethod
135
+ def _get_section_text_or_missing_text(section: str | None) -> str:
136
+ """
137
+ This function returns the input section text if it is not None and not empty, otherwise it returns a missing
138
+ section text "N/A".
139
+
140
+ Args:
141
+ section (str | None):
142
+ The input section text.
143
+
144
+ Returns:
145
+ str: The section text if it is not None and not empty, otherwise "N/A".
146
+ """
147
+ missing_section_text = "N/A"
148
+ if not isinstance(section, str) or len(section) == 0:
149
+ return missing_section_text
150
+ return section
151
+
152
+ @staticmethod
153
+ def _construct_image_chat_messages_for_reporting(has_prior: bool, has_lateral: bool) -> list[SingleChatMessageType]:
154
+ """
155
+ This function constructs user chat messages based on the presence of the prior and lateral images.
156
+
157
+ Args:
158
+ has_prior (bool):
159
+ A boolean indicating whether the prior image is present.
160
+ has_lateral (bool):
161
+ A boolean indicating whether the lateral image is present.
162
+
163
+ Returns:
164
+ list[SingleChatMessageType]: The image prompt messages in the form of a list of dictionaries.
165
+
166
+ Example:
167
+
168
+ ```python
169
+ >>> _construct_image_chat_messages_for_reporting(has_prior=True, has_lateral=True)
170
+ >>> # [
171
+ >>> # {"index": None, "text": "Given the current frontal image", "type": "text"},
172
+ >>> # {"index": 0, "text": None, "type": "image"},
173
+ >>> # {"index": None, "text": " the current lateral image", "type": "text"},
174
+ >>> # {"index": 1, "text": None, "type": "image"},
175
+ >>> # {"index": None, "text": " and the prior frontal image", "type": "text"},
176
+ >>> # {"index": 2, "text": None, "type": "image"},
177
+ >>> # ]
178
+ ```
179
+ """
180
+
181
+ def _add_single_image_to_chat_messages(prompt_text: str, image_index: int) -> None:
182
+ image_prompt.extend(
183
+ [
184
+ {"index": None, "text": prompt_text, "type": "text"},
185
+ {"index": image_index, "text": None, "type": "image"},
186
+ ]
187
+ )
188
+
189
+ image_prompt: list[SingleChatMessageType] = []
190
+ image_index = 0
191
+ if not has_prior and not has_lateral:
192
+ _add_single_image_to_chat_messages("Given the current frontal image only", image_index)
193
+ else:
194
+ _add_single_image_to_chat_messages("Given the current frontal image", image_index)
195
+ image_index += 1
196
+ if has_prior:
197
+ if has_lateral:
198
+ _add_single_image_to_chat_messages(" the current lateral image", image_index)
199
+ image_index += 1
200
+ _add_single_image_to_chat_messages(" and the prior frontal image", image_index)
201
+ else:
202
+ if has_lateral:
203
+ _add_single_image_to_chat_messages(" and the current lateral image", image_index)
204
+ return image_prompt
205
+
206
+ def _construct_chat_messages_reporting(
207
+ self,
208
+ has_prior: bool,
209
+ has_lateral: bool,
210
+ indication: str | None,
211
+ technique: str | None,
212
+ comparison: str | None,
213
+ prior_report: str | None,
214
+ get_grounding: bool = False,
215
+ assistant_text: str | None = None,
216
+ ) -> ChatMessageListType:
217
+ """
218
+ This function constructs the chat messages for reporting used in the grounded and non-grounded reporting tasks.
219
+
220
+ Args:
221
+ has_prior (bool):
222
+ A boolean indicating whether the prior image is present.
223
+ has_lateral (bool):
224
+ A boolean indicating whether the lateral image is present.
225
+ indication (str | None):
226
+ The indication section text.
227
+ technique (str | None):
228
+ The technique section text.
229
+ comparison (str | None):
230
+ The comparison section text.
231
+ prior_report (str | None):
232
+ The prior report section text.
233
+ get_grounding (bool):
234
+ A boolean indicating whether to get the grounding information.
235
+ assistant_text (str | None):
236
+ The assistant text (can be set to None for ordinary inference).
237
+
238
+ Returns:
239
+ ChatMessageListType: The chat messages for reporting in the form of a list of dictionaries.
240
+
241
+ Example:
242
+
243
+ ```python
244
+ >>> _construct_chat_messages_reporting(
245
+ >>> has_prior=True,
246
+ >>> has_lateral=True,
247
+ >>> indication="indication text from report goes here",
248
+ >>> technique="technique text from report goes here",
249
+ >>> comparison="comparison text from report goes here",
250
+ >>> prior_report="prior reporting text goes here",
251
+ >>> get_grounding=False,
252
+ >>> assistant_text=None,
253
+ >>> )
254
+ >>> # [
255
+ >>> # {"index": None, "text": "Given the current frontal image", "type": "text"},
256
+ >>> # {"index": 0, "text": None, "type": "image"},
257
+ >>> # {"index": None, "text": " the current lateral image", "type": "text"},
258
+ >>> # {"index": 1, "text": None, "type": "image"},
259
+ >>> # {"index": None, "text": " and the prior frontal image", "type": "text"},
260
+ >>> # {"index": 2, "text": None, "type": "image"},
261
+ >>> # {"index": None, "text": " PRIOR_REPORT: prior reporting text goes here", "type": "text"},
262
+ >>> # {"index": None, "text": " Provide a description of the findings in the radiology study in comparison to the "
263
+ >>> # "prior frontal image. INDICATION: indication text from report goes here TECHNIQUE: technique text from report "
264
+ >>> # "goes here COMPARISON: comparison text from report goes here", "type": "text"},
265
+ >>> # ]
266
+ ```
267
+ """
268
+ indication = self._get_section_text_or_missing_text(indication)
269
+ technique = self._get_section_text_or_missing_text(technique)
270
+ comparison = self._get_section_text_or_missing_text(comparison)
271
+ prior_report = self._get_section_text_or_missing_text(prior_report)
272
+
273
+ prompt = self._construct_image_chat_messages_for_reporting(has_prior=has_prior, has_lateral=has_lateral)
274
+
275
+ if has_prior:
276
+ prompt.append({"index": None, "text": f" PRIOR_REPORT: {prior_report}", "type": "text"})
277
+
278
+ if get_grounding:
279
+ prompt.append(
280
+ {
281
+ "index": None,
282
+ "text": " Provide a description of the findings in the radiology study in comparison to the "
283
+ "prior frontal image. Each finding should be described as a self-contained plain-text sentence."
284
+ " If the finding is groundable, locate the finding in the current frontal chest X-ray image, "
285
+ "with bounding boxes indicating all locations where it can be seen in the current frontal "
286
+ "image. Otherwise, generate just the ungrounded finding without bounding boxes. INDICATION: "
287
+ f"{indication} TECHNIQUE: {technique} COMPARISON: {comparison}",
288
+ "type": "text",
289
+ }
290
+ )
291
+ else:
292
+ prompt.append(
293
+ {
294
+ "index": None,
295
+ "text": " Provide a description of the findings in the radiology study in comparison to the "
296
+ f"prior frontal image. INDICATION: {indication} TECHNIQUE: {technique} COMPARISON: "
297
+ f"{comparison}",
298
+ "type": "text",
299
+ }
300
+ )
301
+ messages: ChatMessageListType = [{"content": prompt, "role": "user"}]
302
+ if assistant_text is not None:
303
+ messages.append({"content": [{"index": None, "text": assistant_text, "type": "text"}], "role": "assistant"})
304
+ return messages
305
+
306
+ def _construct_chat_messages_phrase_grounding(
307
+ self, phrase: str, assistant_text: str | None = None
308
+ ) -> ChatMessageListType:
309
+ """
310
+ This function constructs the chat messages for phrase grounding used in the phrase grounding task.
311
+
312
+ Args:
313
+ phrase (str):
314
+ The phrase to be grounded.
315
+ assistant_text (str | None):
316
+ The assistant text (can be set to None for ordinary inference).
317
+
318
+ Returns:
319
+ ChatMessageListType: The chat messages for phrase grounding in the form of a list of dictionaries.
320
+ """
321
+ prompt: list[SingleChatMessageType] = [
322
+ {"index": None, "text": "Given the current frontal image", "type": "text"},
323
+ {"index": 0, "text": None, "type": "image"},
324
+ {
325
+ "index": None,
326
+ "text": f" Repeat the following finding as a grounded phrase with bounding boxes indicating all "
327
+ f"locations where it can be seen in the given chest X-ray image. Finding: {phrase}",
328
+ "type": "text",
329
+ },
330
+ ]
331
+ messages: ChatMessageListType = [{"content": prompt, "role": "user"}]
332
+ if assistant_text is not None:
333
+ messages.append({"content": [{"index": None, "text": assistant_text, "type": "text"}], "role": "assistant"})
334
+ return messages
335
+
336
+ def format_reporting_input(
337
+ self,
338
+ current_frontal: Image.Image,
339
+ current_lateral: Image.Image | None,
340
+ prior_frontal: Image.Image | None,
341
+ indication: str | None,
342
+ technique: str | None,
343
+ comparison: str | None,
344
+ prior_report: str | None,
345
+ get_grounding: bool = False,
346
+ assistant_text: str | None = None,
347
+ ) -> tuple[str, list[Image.Image]]:
348
+ """
349
+ This function formats the reporting prompt for the grounded and non-grounded reporting tasks from the given
350
+ input images and text sections. The images are normalized and stacked together in the right order.
351
+
352
+ Args:
353
+ current_frontal (Image.Image):
354
+ The current frontal image.
355
+ current_lateral (Image.Image | None):
356
+ The current lateral image.
357
+ prior_frontal (Image.Image | None):
358
+ The prior frontal image.
359
+ indication (str | None):
360
+ The indication section text.
361
+ technique (str | None):
362
+ The technique section text.
363
+ comparison (str | None):
364
+ The comparison section text.
365
+ prior_report (str | None):
366
+ The prior report section text.
367
+ get_grounding (bool):
368
+ A boolean indicating whether to construct the prompt for grounded or non-grounded reporting.
369
+ assistant_text (str | None): The assistant text (can be set to None for ordinary inference).
370
+
371
+ Returns:
372
+ tuple[str, list[Image.Image]]: The formatted prompt text and the normalized images stacked in the right order.
373
+ """
374
+ images = self._normalize_and_stack_images(
375
+ current_frontal=current_frontal,
376
+ current_lateral=current_lateral,
377
+ prior_frontal=prior_frontal,
378
+ )
379
+ messages = self._construct_chat_messages_reporting(
380
+ has_prior=prior_frontal is not None,
381
+ has_lateral=current_lateral is not None,
382
+ indication=indication,
383
+ technique=technique,
384
+ comparison=comparison,
385
+ prior_report=prior_report,
386
+ get_grounding=get_grounding,
387
+ assistant_text=assistant_text,
388
+ )
389
+ add_generation_prompt = assistant_text is None
390
+ text = self.tokenizer.apply_chat_template(messages, add_generation_prompt=add_generation_prompt, tokenize=False)
391
+ return text, images
392
+
393
+ def format_phrase_grounding_input(
394
+ self,
395
+ frontal_image: Image.Image,
396
+ phrase: str,
397
+ assistant_text: str | None = None,
398
+ ) -> tuple[str, list[Image.Image]]:
399
+ """
400
+ This function formats the phrase grounding prompt for the phrase grounding task from the given input
401
+ image and phrase.
402
+
403
+ Args:
404
+ frontal_image (Image.Image):
405
+ The frontal image.
406
+ phrase (str):
407
+ The phrase to be grounded.
408
+ assistant_text (str | None):
409
+ The assistant text (can be set to None for ordinary inference).
410
+
411
+ Returns:
412
+ tuple[str, list[Image.Image]]: The formatted phrase grounding prompt text and the normalized image.
413
+ """
414
+ images = self._normalize_and_stack_images(
415
+ current_frontal=frontal_image,
416
+ current_lateral=None,
417
+ prior_frontal=None,
418
+ )
419
+ messages = self._construct_chat_messages_phrase_grounding(phrase)
420
+ add_generation_prompt = assistant_text is None
421
+ text = self.tokenizer.apply_chat_template(messages, add_generation_prompt=add_generation_prompt, tokenize=False)
422
+ return text, images
423
+
424
+ def format_and_preprocess_reporting_input(
425
+ self,
426
+ current_frontal: Image.Image,
427
+ current_lateral: Image.Image | None,
428
+ prior_frontal: Image.Image | None,
429
+ indication: str | None,
430
+ technique: str | None,
431
+ comparison: str | None,
432
+ prior_report: str | None,
433
+ get_grounding: bool = False,
434
+ assistant_text: str | None = None,
435
+ **kwargs: Any,
436
+ ) -> BatchFeature:
437
+ """
438
+ This function formats and then preprocesses the input for the grounded and non-grounded reporting tasks from
439
+ the given input images and text sections and returns the batch feature for the model. It calls format_reporting_input
440
+ internally to format the input prompt and stack the images together in the right order.
441
+
442
+ Args:
443
+ current_frontal (Image.Image):
444
+ The current frontal image.
445
+ current_lateral (Image.Image | None):
446
+ The current lateral image.
447
+ prior_frontal (Image.Image | None):
448
+ The prior frontal image.
449
+ indication (str | None):
450
+ The indication section text.
451
+ technique (str | None):
452
+ The technique section text.
453
+ comparison (str | None):
454
+ The comparison section text.
455
+ prior_report (str | None):
456
+ The prior report section text.
457
+ get_grounding (bool):
458
+ A boolean indicating whether to preprocess the input for grounded or non-grounded reporting.
459
+ assistant_text (str | None):
460
+ The assistant text (can be set to None for ordinary inference).
461
+
462
+ Returns:
463
+ BatchFeature: The batch feature for the model, ready to be passed to the model.
464
+
465
+ """
466
+ text, images = self.format_reporting_input(
467
+ current_frontal=current_frontal,
468
+ current_lateral=current_lateral,
469
+ prior_frontal=prior_frontal,
470
+ indication=indication,
471
+ technique=technique,
472
+ comparison=comparison,
473
+ prior_report=prior_report,
474
+ get_grounding=get_grounding,
475
+ assistant_text=assistant_text,
476
+ )
477
+ return self(text=text, images=images, **kwargs)
478
+
479
+ def format_and_preprocess_phrase_grounding_input(
480
+ self,
481
+ frontal_image: Image.Image,
482
+ phrase: str,
483
+ assistant_text: str | None = None,
484
+ **kwargs: Any,
485
+ ) -> BatchFeature:
486
+ """
487
+ This function formats and then processes the input for the phrase grounding task from the given input image and
488
+ phrase and returns the batch feature for the model. It calls format_phrase_grounding_input internally to format
489
+ the input prompt and normalize the image.
490
+
491
+ Args:
492
+ frontal_image (Image.Image):
493
+ The frontal image.
494
+ phrase (str):
495
+ The phrase to be grounded.
496
+ assistant_text (str | None):
497
+ The assistant text (can be set to None for ordinary inference).
498
+
499
+ Returns:
500
+ BatchFeature: The batch feature for the model, ready to be passed to the model.
501
+ """
502
+ text, images = self.format_phrase_grounding_input(
503
+ frontal_image=frontal_image,
504
+ phrase=phrase,
505
+ assistant_text=assistant_text,
506
+ )
507
+ return self(text=text, images=images, **kwargs)
508
+
509
+ def _get_text_between_delimiters(self, text: str, begin_token: str, end_token: str) -> list[str]:
510
+ """
511
+ This function splits the input text into a list of substrings beased on the given begin and end tokens.
512
+
513
+ Args:
514
+ text (str):
515
+ The input text to be split.
516
+ begin_token (str):
517
+ The begin token.
518
+ end_token (str):
519
+ The end token.
520
+
521
+ Returns:
522
+ list[str]: The list of substrings between the given begin and end tokens.
523
+
524
+ Example:
525
+
526
+ ```python
527
+ >>> _get_text_between_delimiters("<obj>This is a grounded phrase</obj>. <obj>This is another grounded phrase</obj>.", "<obj>", "</obj>")
528
+ >>> # ["grounded phrase", "This is another grounded phrase"]
529
+
530
+ >>> _get_text_between_delimiters("<box><x10><y20><x30><y40></box><box><x50><y60><x70><y80></box>", "<box>", "</box>")
531
+ >>> # ["<x10><y20><x30><y40>", "<x50><y60><x70><y80>"]
532
+ ```
533
+ """
534
+ split_text = []
535
+ while begin_token in text:
536
+ assert text.startswith(begin_token)
537
+ end_index = text.find(end_token)
538
+ assert end_index != -1
539
+ split_text.append(text[len(begin_token) : end_index])
540
+ text = text[end_index + len(end_token) :]
541
+ assert len(text) == 0
542
+ return split_text
543
+
544
+ def convert_output_to_plaintext_or_grounded_sequence(
545
+ self, text: str
546
+ ) -> str | list[tuple[str, list[BoxType] | None]]:
547
+ """
548
+ This function converts the input text to a grounded sequence by extracting the grounded phrases and bounding
549
+ boxes from the text. If the text is plaintext without any grounded phrases, it returns the text as is.
550
+
551
+ Args:
552
+ text (str):
553
+ The input text to be converted.
554
+
555
+ Returns:
556
+ str | list[tuple[str, list[BoxType] | None]]: The grounded sequence.
557
+
558
+ Example:
559
+
560
+ ```python
561
+ >>> convert_output_to_plaintext_or_grounded_sequence("<obj>grounded phrase <box><x55><y45><x70><y56></box></obj><obj>ungrounded phrase</obj>")
562
+ >>> # [
563
+ >>> # ("grounded phrase", [(0.55, 0.45, 0.70, 0.56)]),
564
+ >>> # ("ungrounded phrase", None),
565
+ >>> # ]
566
+
567
+ >>> convert_output_to_plaintext_or_grounded_sequence("plain text")
568
+ >>> # "plain text"
569
+ ```
570
+ """
571
+ text = text.strip()
572
+
573
+ # Plain text
574
+ if not any(
575
+ [
576
+ self.phrase_start_token in text,
577
+ self.phrase_end_token in text,
578
+ self.box_start_token in text,
579
+ self.box_end_token in text,
580
+ ]
581
+ ):
582
+ return text
583
+
584
+ # One or more grounded phrases
585
+ grounded_phrase_texts = self._get_text_between_delimiters(text, self.phrase_start_token, self.phrase_end_token)
586
+ grounded_phrases: list[tuple[str, list[BoxType] | None]] = []
587
+ for grounded_phrase_text in grounded_phrase_texts:
588
+ if self.box_start_token in grounded_phrase_text or self.box_end_token in grounded_phrase_text:
589
+ first_box_start_index = grounded_phrase_text.find(self.box_start_token)
590
+ phrase_text = grounded_phrase_text[:first_box_start_index].strip()
591
+ boxes_text = grounded_phrase_text[first_box_start_index:]
592
+ boxes_text_list = self._get_text_between_delimiters(
593
+ boxes_text, self.box_start_token, self.box_end_token
594
+ )
595
+ boxes: list[BoxType] = []
596
+ for box_text in boxes_text_list:
597
+ # extract from <x_><y_><x_><y_>
598
+ regex = r"<x(\d+?)><y(\d+?)><x(\d+?)><y(\d+?)>"
599
+ match = re.search(regex, box_text)
600
+ if match:
601
+ x_min, y_min, x_max, y_max = match.groups()
602
+ box: BoxType = tuple( # type: ignore[assignment]
603
+ (int(coord) + 0.5) / self.num_box_coord_bins for coord in (x_min, y_min, x_max, y_max)
604
+ )
605
+ assert all(0 <= coord <= 1 for coord in box), f"Invalid box coordinates: {box}"
606
+ boxes.append(box)
607
+ else:
608
+ raise ValueError(f"Invalid box coordinates: {box_text} not matching regex {regex}")
609
+ grounded_phrases.append((phrase_text, boxes))
610
+ else:
611
+ grounded_phrases.append((grounded_phrase_text.lstrip(), None))
612
+ return grounded_phrases
613
+
614
+ @staticmethod
615
+ def adjust_box_for_original_image_size(box: BoxType, width: int, height: int) -> BoxType:
616
+ """
617
+ This function adjusts the bounding boxes from the MAIRA-2 model output to account for the image processor
618
+ cropping the image to be square prior to the model forward pass. The box coordinates are adjusted to be
619
+ relative to the original shape of the image assuming the image processor cropped the image based on the length
620
+ of the shortest side.
621
+
622
+ Args:
623
+ box (BoxType):
624
+ The box to be adjusted, normalised to (0, 1).
625
+ width (int):
626
+ Original width of the image, in pixels.
627
+ height (int):
628
+ Original height of the image, in pixels.
629
+
630
+ Returns:
631
+ BoxType: The box normalised relative to the original size of the image.
632
+ """
633
+ crop_width = crop_height = min(width, height)
634
+ x_offset = (width - crop_width) // 2
635
+ y_offset = (height - crop_height) // 2
636
+
637
+ norm_x_min, norm_y_min, norm_x_max, norm_y_max = box
638
+
639
+ abs_x_min = int(norm_x_min * crop_width + x_offset)
640
+ abs_x_max = int(norm_x_max * crop_width + x_offset)
641
+ abs_y_min = int(norm_y_min * crop_height + y_offset)
642
+ abs_y_max = int(norm_y_max * crop_height + y_offset)
643
+
644
+ adjusted_norm_x_min = abs_x_min / width
645
+ adjusted_norm_x_max = abs_x_max / width
646
+ adjusted_norm_y_min = abs_y_min / height
647
+ adjusted_norm_y_max = abs_y_max / height
648
+
649
+ return (adjusted_norm_x_min, adjusted_norm_y_min, adjusted_norm_x_max, adjusted_norm_y_max)
processor_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoProcessor": "processing_maira2.Maira2Processor"
4
+ },
5
+ "box_end_token": "</box>",
6
+ "box_start_token": "<box>",
7
+ "image_token": "<image>",
8
+ "num_additional_image_tokens": 1,
9
+ "num_box_coord_bins": 100,
10
+ "patch_size": 14,
11
+ "phrase_end_token": "</obj>",
12
+ "phrase_start_token": "<obj>",
13
+ "processor_class": "Maira2Processor",
14
+ "vision_feature_select_strategy": "default"
15
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a8f238a200be6c23fbba0f9a999ab4fe3c09ca303b29805e68cf6659bfb7d89
3
+ size 131
tokenizer_config.json ADDED
@@ -0,0 +1,1702 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": true,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ },
30
+ "32000": {
31
+ "content": "<obj>",
32
+ "lstrip": false,
33
+ "normalized": true,
34
+ "rstrip": false,
35
+ "single_word": false,
36
+ "special": false
37
+ },
38
+ "32001": {
39
+ "content": "</obj>",
40
+ "lstrip": false,
41
+ "normalized": true,
42
+ "rstrip": false,
43
+ "single_word": false,
44
+ "special": false
45
+ },
46
+ "32002": {
47
+ "content": "<x0>",
48
+ "lstrip": false,
49
+ "normalized": true,
50
+ "rstrip": false,
51
+ "single_word": false,
52
+ "special": false
53
+ },
54
+ "32003": {
55
+ "content": "<x1>",
56
+ "lstrip": false,
57
+ "normalized": true,
58
+ "rstrip": false,
59
+ "single_word": false,
60
+ "special": false
61
+ },
62
+ "32004": {
63
+ "content": "<x2>",
64
+ "lstrip": false,
65
+ "normalized": true,
66
+ "rstrip": false,
67
+ "single_word": false,
68
+ "special": false
69
+ },
70
+ "32005": {
71
+ "content": "<x3>",
72
+ "lstrip": false,
73
+ "normalized": true,
74
+ "rstrip": false,
75
+ "single_word": false,
76
+ "special": false
77
+ },
78
+ "32006": {
79
+ "content": "<x4>",
80
+ "lstrip": false,
81
+ "normalized": true,
82
+ "rstrip": false,
83
+ "single_word": false,
84
+ "special": false
85
+ },
86
+ "32007": {
87
+ "content": "<x5>",
88
+ "lstrip": false,
89
+ "normalized": true,
90
+ "rstrip": false,
91
+ "single_word": false,
92
+ "special": false
93
+ },
94
+ "32008": {
95
+ "content": "<x6>",
96
+ "lstrip": false,
97
+ "normalized": true,
98
+ "rstrip": false,
99
+ "single_word": false,
100
+ "special": false
101
+ },
102
+ "32009": {
103
+ "content": "<x7>",
104
+ "lstrip": false,
105
+ "normalized": true,
106
+ "rstrip": false,
107
+ "single_word": false,
108
+ "special": false
109
+ },
110
+ "32010": {
111
+ "content": "<x8>",
112
+ "lstrip": false,
113
+ "normalized": true,
114
+ "rstrip": false,
115
+ "single_word": false,
116
+ "special": false
117
+ },
118
+ "32011": {
119
+ "content": "<x9>",
120
+ "lstrip": false,
121
+ "normalized": true,
122
+ "rstrip": false,
123
+ "single_word": false,
124
+ "special": false
125
+ },
126
+ "32012": {
127
+ "content": "<x10>",
128
+ "lstrip": false,
129
+ "normalized": true,
130
+ "rstrip": false,
131
+ "single_word": false,
132
+ "special": false
133
+ },
134
+ "32013": {
135
+ "content": "<x11>",
136
+ "lstrip": false,
137
+ "normalized": true,
138
+ "rstrip": false,
139
+ "single_word": false,
140
+ "special": false
141
+ },
142
+ "32014": {
143
+ "content": "<x12>",
144
+ "lstrip": false,
145
+ "normalized": true,
146
+ "rstrip": false,
147
+ "single_word": false,
148
+ "special": false
149
+ },
150
+ "32015": {
151
+ "content": "<x13>",
152
+ "lstrip": false,
153
+ "normalized": true,
154
+ "rstrip": false,
155
+ "single_word": false,
156
+ "special": false
157
+ },
158
+ "32016": {
159
+ "content": "<x14>",
160
+ "lstrip": false,
161
+ "normalized": true,
162
+ "rstrip": false,
163
+ "single_word": false,
164
+ "special": false
165
+ },
166
+ "32017": {
167
+ "content": "<x15>",
168
+ "lstrip": false,
169
+ "normalized": true,
170
+ "rstrip": false,
171
+ "single_word": false,
172
+ "special": false
173
+ },
174
+ "32018": {
175
+ "content": "<x16>",
176
+ "lstrip": false,
177
+ "normalized": true,
178
+ "rstrip": false,
179
+ "single_word": false,
180
+ "special": false
181
+ },
182
+ "32019": {
183
+ "content": "<x17>",
184
+ "lstrip": false,
185
+ "normalized": true,
186
+ "rstrip": false,
187
+ "single_word": false,
188
+ "special": false
189
+ },
190
+ "32020": {
191
+ "content": "<x18>",
192
+ "lstrip": false,
193
+ "normalized": true,
194
+ "rstrip": false,
195
+ "single_word": false,
196
+ "special": false
197
+ },
198
+ "32021": {
199
+ "content": "<x19>",
200
+ "lstrip": false,
201
+ "normalized": true,
202
+ "rstrip": false,
203
+ "single_word": false,
204
+ "special": false
205
+ },
206
+ "32022": {
207
+ "content": "<x20>",
208
+ "lstrip": false,
209
+ "normalized": true,
210
+ "rstrip": false,
211
+ "single_word": false,
212
+ "special": false
213
+ },
214
+ "32023": {
215
+ "content": "<x21>",
216
+ "lstrip": false,
217
+ "normalized": true,
218
+ "rstrip": false,
219
+ "single_word": false,
220
+ "special": false
221
+ },
222
+ "32024": {
223
+ "content": "<x22>",
224
+ "lstrip": false,
225
+ "normalized": true,
226
+ "rstrip": false,
227
+ "single_word": false,
228
+ "special": false
229
+ },
230
+ "32025": {
231
+ "content": "<x23>",
232
+ "lstrip": false,
233
+ "normalized": true,
234
+ "rstrip": false,
235
+ "single_word": false,
236
+ "special": false
237
+ },
238
+ "32026": {
239
+ "content": "<x24>",
240
+ "lstrip": false,
241
+ "normalized": true,
242
+ "rstrip": false,
243
+ "single_word": false,
244
+ "special": false
245
+ },
246
+ "32027": {
247
+ "content": "<x25>",
248
+ "lstrip": false,
249
+ "normalized": true,
250
+ "rstrip": false,
251
+ "single_word": false,
252
+ "special": false
253
+ },
254
+ "32028": {
255
+ "content": "<x26>",
256
+ "lstrip": false,
257
+ "normalized": true,
258
+ "rstrip": false,
259
+ "single_word": false,
260
+ "special": false
261
+ },
262
+ "32029": {
263
+ "content": "<x27>",
264
+ "lstrip": false,
265
+ "normalized": true,
266
+ "rstrip": false,
267
+ "single_word": false,
268
+ "special": false
269
+ },
270
+ "32030": {
271
+ "content": "<x28>",
272
+ "lstrip": false,
273
+ "normalized": true,
274
+ "rstrip": false,
275
+ "single_word": false,
276
+ "special": false
277
+ },
278
+ "32031": {
279
+ "content": "<x29>",
280
+ "lstrip": false,
281
+ "normalized": true,
282
+ "rstrip": false,
283
+ "single_word": false,
284
+ "special": false
285
+ },
286
+ "32032": {
287
+ "content": "<x30>",
288
+ "lstrip": false,
289
+ "normalized": true,
290
+ "rstrip": false,
291
+ "single_word": false,
292
+ "special": false
293
+ },
294
+ "32033": {
295
+ "content": "<x31>",
296
+ "lstrip": false,
297
+ "normalized": true,
298
+ "rstrip": false,
299
+ "single_word": false,
300
+ "special": false
301
+ },
302
+ "32034": {
303
+ "content": "<x32>",
304
+ "lstrip": false,
305
+ "normalized": true,
306
+ "rstrip": false,
307
+ "single_word": false,
308
+ "special": false
309
+ },
310
+ "32035": {
311
+ "content": "<x33>",
312
+ "lstrip": false,
313
+ "normalized": true,
314
+ "rstrip": false,
315
+ "single_word": false,
316
+ "special": false
317
+ },
318
+ "32036": {
319
+ "content": "<x34>",
320
+ "lstrip": false,
321
+ "normalized": true,
322
+ "rstrip": false,
323
+ "single_word": false,
324
+ "special": false
325
+ },
326
+ "32037": {
327
+ "content": "<x35>",
328
+ "lstrip": false,
329
+ "normalized": true,
330
+ "rstrip": false,
331
+ "single_word": false,
332
+ "special": false
333
+ },
334
+ "32038": {
335
+ "content": "<x36>",
336
+ "lstrip": false,
337
+ "normalized": true,
338
+ "rstrip": false,
339
+ "single_word": false,
340
+ "special": false
341
+ },
342
+ "32039": {
343
+ "content": "<x37>",
344
+ "lstrip": false,
345
+ "normalized": true,
346
+ "rstrip": false,
347
+ "single_word": false,
348
+ "special": false
349
+ },
350
+ "32040": {
351
+ "content": "<x38>",
352
+ "lstrip": false,
353
+ "normalized": true,
354
+ "rstrip": false,
355
+ "single_word": false,
356
+ "special": false
357
+ },
358
+ "32041": {
359
+ "content": "<x39>",
360
+ "lstrip": false,
361
+ "normalized": true,
362
+ "rstrip": false,
363
+ "single_word": false,
364
+ "special": false
365
+ },
366
+ "32042": {
367
+ "content": "<x40>",
368
+ "lstrip": false,
369
+ "normalized": true,
370
+ "rstrip": false,
371
+ "single_word": false,
372
+ "special": false
373
+ },
374
+ "32043": {
375
+ "content": "<x41>",
376
+ "lstrip": false,
377
+ "normalized": true,
378
+ "rstrip": false,
379
+ "single_word": false,
380
+ "special": false
381
+ },
382
+ "32044": {
383
+ "content": "<x42>",
384
+ "lstrip": false,
385
+ "normalized": true,
386
+ "rstrip": false,
387
+ "single_word": false,
388
+ "special": false
389
+ },
390
+ "32045": {
391
+ "content": "<x43>",
392
+ "lstrip": false,
393
+ "normalized": true,
394
+ "rstrip": false,
395
+ "single_word": false,
396
+ "special": false
397
+ },
398
+ "32046": {
399
+ "content": "<x44>",
400
+ "lstrip": false,
401
+ "normalized": true,
402
+ "rstrip": false,
403
+ "single_word": false,
404
+ "special": false
405
+ },
406
+ "32047": {
407
+ "content": "<x45>",
408
+ "lstrip": false,
409
+ "normalized": true,
410
+ "rstrip": false,
411
+ "single_word": false,
412
+ "special": false
413
+ },
414
+ "32048": {
415
+ "content": "<x46>",
416
+ "lstrip": false,
417
+ "normalized": true,
418
+ "rstrip": false,
419
+ "single_word": false,
420
+ "special": false
421
+ },
422
+ "32049": {
423
+ "content": "<x47>",
424
+ "lstrip": false,
425
+ "normalized": true,
426
+ "rstrip": false,
427
+ "single_word": false,
428
+ "special": false
429
+ },
430
+ "32050": {
431
+ "content": "<x48>",
432
+ "lstrip": false,
433
+ "normalized": true,
434
+ "rstrip": false,
435
+ "single_word": false,
436
+ "special": false
437
+ },
438
+ "32051": {
439
+ "content": "<x49>",
440
+ "lstrip": false,
441
+ "normalized": true,
442
+ "rstrip": false,
443
+ "single_word": false,
444
+ "special": false
445
+ },
446
+ "32052": {
447
+ "content": "<x50>",
448
+ "lstrip": false,
449
+ "normalized": true,
450
+ "rstrip": false,
451
+ "single_word": false,
452
+ "special": false
453
+ },
454
+ "32053": {
455
+ "content": "<x51>",
456
+ "lstrip": false,
457
+ "normalized": true,
458
+ "rstrip": false,
459
+ "single_word": false,
460
+ "special": false
461
+ },
462
+ "32054": {
463
+ "content": "<x52>",
464
+ "lstrip": false,
465
+ "normalized": true,
466
+ "rstrip": false,
467
+ "single_word": false,
468
+ "special": false
469
+ },
470
+ "32055": {
471
+ "content": "<x53>",
472
+ "lstrip": false,
473
+ "normalized": true,
474
+ "rstrip": false,
475
+ "single_word": false,
476
+ "special": false
477
+ },
478
+ "32056": {
479
+ "content": "<x54>",
480
+ "lstrip": false,
481
+ "normalized": true,
482
+ "rstrip": false,
483
+ "single_word": false,
484
+ "special": false
485
+ },
486
+ "32057": {
487
+ "content": "<x55>",
488
+ "lstrip": false,
489
+ "normalized": true,
490
+ "rstrip": false,
491
+ "single_word": false,
492
+ "special": false
493
+ },
494
+ "32058": {
495
+ "content": "<x56>",
496
+ "lstrip": false,
497
+ "normalized": true,
498
+ "rstrip": false,
499
+ "single_word": false,
500
+ "special": false
501
+ },
502
+ "32059": {
503
+ "content": "<x57>",
504
+ "lstrip": false,
505
+ "normalized": true,
506
+ "rstrip": false,
507
+ "single_word": false,
508
+ "special": false
509
+ },
510
+ "32060": {
511
+ "content": "<x58>",
512
+ "lstrip": false,
513
+ "normalized": true,
514
+ "rstrip": false,
515
+ "single_word": false,
516
+ "special": false
517
+ },
518
+ "32061": {
519
+ "content": "<x59>",
520
+ "lstrip": false,
521
+ "normalized": true,
522
+ "rstrip": false,
523
+ "single_word": false,
524
+ "special": false
525
+ },
526
+ "32062": {
527
+ "content": "<x60>",
528
+ "lstrip": false,
529
+ "normalized": true,
530
+ "rstrip": false,
531
+ "single_word": false,
532
+ "special": false
533
+ },
534
+ "32063": {
535
+ "content": "<x61>",
536
+ "lstrip": false,
537
+ "normalized": true,
538
+ "rstrip": false,
539
+ "single_word": false,
540
+ "special": false
541
+ },
542
+ "32064": {
543
+ "content": "<x62>",
544
+ "lstrip": false,
545
+ "normalized": true,
546
+ "rstrip": false,
547
+ "single_word": false,
548
+ "special": false
549
+ },
550
+ "32065": {
551
+ "content": "<x63>",
552
+ "lstrip": false,
553
+ "normalized": true,
554
+ "rstrip": false,
555
+ "single_word": false,
556
+ "special": false
557
+ },
558
+ "32066": {
559
+ "content": "<x64>",
560
+ "lstrip": false,
561
+ "normalized": true,
562
+ "rstrip": false,
563
+ "single_word": false,
564
+ "special": false
565
+ },
566
+ "32067": {
567
+ "content": "<x65>",
568
+ "lstrip": false,
569
+ "normalized": true,
570
+ "rstrip": false,
571
+ "single_word": false,
572
+ "special": false
573
+ },
574
+ "32068": {
575
+ "content": "<x66>",
576
+ "lstrip": false,
577
+ "normalized": true,
578
+ "rstrip": false,
579
+ "single_word": false,
580
+ "special": false
581
+ },
582
+ "32069": {
583
+ "content": "<x67>",
584
+ "lstrip": false,
585
+ "normalized": true,
586
+ "rstrip": false,
587
+ "single_word": false,
588
+ "special": false
589
+ },
590
+ "32070": {
591
+ "content": "<x68>",
592
+ "lstrip": false,
593
+ "normalized": true,
594
+ "rstrip": false,
595
+ "single_word": false,
596
+ "special": false
597
+ },
598
+ "32071": {
599
+ "content": "<x69>",
600
+ "lstrip": false,
601
+ "normalized": true,
602
+ "rstrip": false,
603
+ "single_word": false,
604
+ "special": false
605
+ },
606
+ "32072": {
607
+ "content": "<x70>",
608
+ "lstrip": false,
609
+ "normalized": true,
610
+ "rstrip": false,
611
+ "single_word": false,
612
+ "special": false
613
+ },
614
+ "32073": {
615
+ "content": "<x71>",
616
+ "lstrip": false,
617
+ "normalized": true,
618
+ "rstrip": false,
619
+ "single_word": false,
620
+ "special": false
621
+ },
622
+ "32074": {
623
+ "content": "<x72>",
624
+ "lstrip": false,
625
+ "normalized": true,
626
+ "rstrip": false,
627
+ "single_word": false,
628
+ "special": false
629
+ },
630
+ "32075": {
631
+ "content": "<x73>",
632
+ "lstrip": false,
633
+ "normalized": true,
634
+ "rstrip": false,
635
+ "single_word": false,
636
+ "special": false
637
+ },
638
+ "32076": {
639
+ "content": "<x74>",
640
+ "lstrip": false,
641
+ "normalized": true,
642
+ "rstrip": false,
643
+ "single_word": false,
644
+ "special": false
645
+ },
646
+ "32077": {
647
+ "content": "<x75>",
648
+ "lstrip": false,
649
+ "normalized": true,
650
+ "rstrip": false,
651
+ "single_word": false,
652
+ "special": false
653
+ },
654
+ "32078": {
655
+ "content": "<x76>",
656
+ "lstrip": false,
657
+ "normalized": true,
658
+ "rstrip": false,
659
+ "single_word": false,
660
+ "special": false
661
+ },
662
+ "32079": {
663
+ "content": "<x77>",
664
+ "lstrip": false,
665
+ "normalized": true,
666
+ "rstrip": false,
667
+ "single_word": false,
668
+ "special": false
669
+ },
670
+ "32080": {
671
+ "content": "<x78>",
672
+ "lstrip": false,
673
+ "normalized": true,
674
+ "rstrip": false,
675
+ "single_word": false,
676
+ "special": false
677
+ },
678
+ "32081": {
679
+ "content": "<x79>",
680
+ "lstrip": false,
681
+ "normalized": true,
682
+ "rstrip": false,
683
+ "single_word": false,
684
+ "special": false
685
+ },
686
+ "32082": {
687
+ "content": "<x80>",
688
+ "lstrip": false,
689
+ "normalized": true,
690
+ "rstrip": false,
691
+ "single_word": false,
692
+ "special": false
693
+ },
694
+ "32083": {
695
+ "content": "<x81>",
696
+ "lstrip": false,
697
+ "normalized": true,
698
+ "rstrip": false,
699
+ "single_word": false,
700
+ "special": false
701
+ },
702
+ "32084": {
703
+ "content": "<x82>",
704
+ "lstrip": false,
705
+ "normalized": true,
706
+ "rstrip": false,
707
+ "single_word": false,
708
+ "special": false
709
+ },
710
+ "32085": {
711
+ "content": "<x83>",
712
+ "lstrip": false,
713
+ "normalized": true,
714
+ "rstrip": false,
715
+ "single_word": false,
716
+ "special": false
717
+ },
718
+ "32086": {
719
+ "content": "<x84>",
720
+ "lstrip": false,
721
+ "normalized": true,
722
+ "rstrip": false,
723
+ "single_word": false,
724
+ "special": false
725
+ },
726
+ "32087": {
727
+ "content": "<x85>",
728
+ "lstrip": false,
729
+ "normalized": true,
730
+ "rstrip": false,
731
+ "single_word": false,
732
+ "special": false
733
+ },
734
+ "32088": {
735
+ "content": "<x86>",
736
+ "lstrip": false,
737
+ "normalized": true,
738
+ "rstrip": false,
739
+ "single_word": false,
740
+ "special": false
741
+ },
742
+ "32089": {
743
+ "content": "<x87>",
744
+ "lstrip": false,
745
+ "normalized": true,
746
+ "rstrip": false,
747
+ "single_word": false,
748
+ "special": false
749
+ },
750
+ "32090": {
751
+ "content": "<x88>",
752
+ "lstrip": false,
753
+ "normalized": true,
754
+ "rstrip": false,
755
+ "single_word": false,
756
+ "special": false
757
+ },
758
+ "32091": {
759
+ "content": "<x89>",
760
+ "lstrip": false,
761
+ "normalized": true,
762
+ "rstrip": false,
763
+ "single_word": false,
764
+ "special": false
765
+ },
766
+ "32092": {
767
+ "content": "<x90>",
768
+ "lstrip": false,
769
+ "normalized": true,
770
+ "rstrip": false,
771
+ "single_word": false,
772
+ "special": false
773
+ },
774
+ "32093": {
775
+ "content": "<x91>",
776
+ "lstrip": false,
777
+ "normalized": true,
778
+ "rstrip": false,
779
+ "single_word": false,
780
+ "special": false
781
+ },
782
+ "32094": {
783
+ "content": "<x92>",
784
+ "lstrip": false,
785
+ "normalized": true,
786
+ "rstrip": false,
787
+ "single_word": false,
788
+ "special": false
789
+ },
790
+ "32095": {
791
+ "content": "<x93>",
792
+ "lstrip": false,
793
+ "normalized": true,
794
+ "rstrip": false,
795
+ "single_word": false,
796
+ "special": false
797
+ },
798
+ "32096": {
799
+ "content": "<x94>",
800
+ "lstrip": false,
801
+ "normalized": true,
802
+ "rstrip": false,
803
+ "single_word": false,
804
+ "special": false
805
+ },
806
+ "32097": {
807
+ "content": "<x95>",
808
+ "lstrip": false,
809
+ "normalized": true,
810
+ "rstrip": false,
811
+ "single_word": false,
812
+ "special": false
813
+ },
814
+ "32098": {
815
+ "content": "<x96>",
816
+ "lstrip": false,
817
+ "normalized": true,
818
+ "rstrip": false,
819
+ "single_word": false,
820
+ "special": false
821
+ },
822
+ "32099": {
823
+ "content": "<x97>",
824
+ "lstrip": false,
825
+ "normalized": true,
826
+ "rstrip": false,
827
+ "single_word": false,
828
+ "special": false
829
+ },
830
+ "32100": {
831
+ "content": "<x98>",
832
+ "lstrip": false,
833
+ "normalized": true,
834
+ "rstrip": false,
835
+ "single_word": false,
836
+ "special": false
837
+ },
838
+ "32101": {
839
+ "content": "<x99>",
840
+ "lstrip": false,
841
+ "normalized": true,
842
+ "rstrip": false,
843
+ "single_word": false,
844
+ "special": false
845
+ },
846
+ "32102": {
847
+ "content": "<y0>",
848
+ "lstrip": false,
849
+ "normalized": true,
850
+ "rstrip": false,
851
+ "single_word": false,
852
+ "special": false
853
+ },
854
+ "32103": {
855
+ "content": "<y1>",
856
+ "lstrip": false,
857
+ "normalized": true,
858
+ "rstrip": false,
859
+ "single_word": false,
860
+ "special": false
861
+ },
862
+ "32104": {
863
+ "content": "<y2>",
864
+ "lstrip": false,
865
+ "normalized": true,
866
+ "rstrip": false,
867
+ "single_word": false,
868
+ "special": false
869
+ },
870
+ "32105": {
871
+ "content": "<y3>",
872
+ "lstrip": false,
873
+ "normalized": true,
874
+ "rstrip": false,
875
+ "single_word": false,
876
+ "special": false
877
+ },
878
+ "32106": {
879
+ "content": "<y4>",
880
+ "lstrip": false,
881
+ "normalized": true,
882
+ "rstrip": false,
883
+ "single_word": false,
884
+ "special": false
885
+ },
886
+ "32107": {
887
+ "content": "<y5>",
888
+ "lstrip": false,
889
+ "normalized": true,
890
+ "rstrip": false,
891
+ "single_word": false,
892
+ "special": false
893
+ },
894
+ "32108": {
895
+ "content": "<y6>",
896
+ "lstrip": false,
897
+ "normalized": true,
898
+ "rstrip": false,
899
+ "single_word": false,
900
+ "special": false
901
+ },
902
+ "32109": {
903
+ "content": "<y7>",
904
+ "lstrip": false,
905
+ "normalized": true,
906
+ "rstrip": false,
907
+ "single_word": false,
908
+ "special": false
909
+ },
910
+ "32110": {
911
+ "content": "<y8>",
912
+ "lstrip": false,
913
+ "normalized": true,
914
+ "rstrip": false,
915
+ "single_word": false,
916
+ "special": false
917
+ },
918
+ "32111": {
919
+ "content": "<y9>",
920
+ "lstrip": false,
921
+ "normalized": true,
922
+ "rstrip": false,
923
+ "single_word": false,
924
+ "special": false
925
+ },
926
+ "32112": {
927
+ "content": "<y10>",
928
+ "lstrip": false,
929
+ "normalized": true,
930
+ "rstrip": false,
931
+ "single_word": false,
932
+ "special": false
933
+ },
934
+ "32113": {
935
+ "content": "<y11>",
936
+ "lstrip": false,
937
+ "normalized": true,
938
+ "rstrip": false,
939
+ "single_word": false,
940
+ "special": false
941
+ },
942
+ "32114": {
943
+ "content": "<y12>",
944
+ "lstrip": false,
945
+ "normalized": true,
946
+ "rstrip": false,
947
+ "single_word": false,
948
+ "special": false
949
+ },
950
+ "32115": {
951
+ "content": "<y13>",
952
+ "lstrip": false,
953
+ "normalized": true,
954
+ "rstrip": false,
955
+ "single_word": false,
956
+ "special": false
957
+ },
958
+ "32116": {
959
+ "content": "<y14>",
960
+ "lstrip": false,
961
+ "normalized": true,
962
+ "rstrip": false,
963
+ "single_word": false,
964
+ "special": false
965
+ },
966
+ "32117": {
967
+ "content": "<y15>",
968
+ "lstrip": false,
969
+ "normalized": true,
970
+ "rstrip": false,
971
+ "single_word": false,
972
+ "special": false
973
+ },
974
+ "32118": {
975
+ "content": "<y16>",
976
+ "lstrip": false,
977
+ "normalized": true,
978
+ "rstrip": false,
979
+ "single_word": false,
980
+ "special": false
981
+ },
982
+ "32119": {
983
+ "content": "<y17>",
984
+ "lstrip": false,
985
+ "normalized": true,
986
+ "rstrip": false,
987
+ "single_word": false,
988
+ "special": false
989
+ },
990
+ "32120": {
991
+ "content": "<y18>",
992
+ "lstrip": false,
993
+ "normalized": true,
994
+ "rstrip": false,
995
+ "single_word": false,
996
+ "special": false
997
+ },
998
+ "32121": {
999
+ "content": "<y19>",
1000
+ "lstrip": false,
1001
+ "normalized": true,
1002
+ "rstrip": false,
1003
+ "single_word": false,
1004
+ "special": false
1005
+ },
1006
+ "32122": {
1007
+ "content": "<y20>",
1008
+ "lstrip": false,
1009
+ "normalized": true,
1010
+ "rstrip": false,
1011
+ "single_word": false,
1012
+ "special": false
1013
+ },
1014
+ "32123": {
1015
+ "content": "<y21>",
1016
+ "lstrip": false,
1017
+ "normalized": true,
1018
+ "rstrip": false,
1019
+ "single_word": false,
1020
+ "special": false
1021
+ },
1022
+ "32124": {
1023
+ "content": "<y22>",
1024
+ "lstrip": false,
1025
+ "normalized": true,
1026
+ "rstrip": false,
1027
+ "single_word": false,
1028
+ "special": false
1029
+ },
1030
+ "32125": {
1031
+ "content": "<y23>",
1032
+ "lstrip": false,
1033
+ "normalized": true,
1034
+ "rstrip": false,
1035
+ "single_word": false,
1036
+ "special": false
1037
+ },
1038
+ "32126": {
1039
+ "content": "<y24>",
1040
+ "lstrip": false,
1041
+ "normalized": true,
1042
+ "rstrip": false,
1043
+ "single_word": false,
1044
+ "special": false
1045
+ },
1046
+ "32127": {
1047
+ "content": "<y25>",
1048
+ "lstrip": false,
1049
+ "normalized": true,
1050
+ "rstrip": false,
1051
+ "single_word": false,
1052
+ "special": false
1053
+ },
1054
+ "32128": {
1055
+ "content": "<y26>",
1056
+ "lstrip": false,
1057
+ "normalized": true,
1058
+ "rstrip": false,
1059
+ "single_word": false,
1060
+ "special": false
1061
+ },
1062
+ "32129": {
1063
+ "content": "<y27>",
1064
+ "lstrip": false,
1065
+ "normalized": true,
1066
+ "rstrip": false,
1067
+ "single_word": false,
1068
+ "special": false
1069
+ },
1070
+ "32130": {
1071
+ "content": "<y28>",
1072
+ "lstrip": false,
1073
+ "normalized": true,
1074
+ "rstrip": false,
1075
+ "single_word": false,
1076
+ "special": false
1077
+ },
1078
+ "32131": {
1079
+ "content": "<y29>",
1080
+ "lstrip": false,
1081
+ "normalized": true,
1082
+ "rstrip": false,
1083
+ "single_word": false,
1084
+ "special": false
1085
+ },
1086
+ "32132": {
1087
+ "content": "<y30>",
1088
+ "lstrip": false,
1089
+ "normalized": true,
1090
+ "rstrip": false,
1091
+ "single_word": false,
1092
+ "special": false
1093
+ },
1094
+ "32133": {
1095
+ "content": "<y31>",
1096
+ "lstrip": false,
1097
+ "normalized": true,
1098
+ "rstrip": false,
1099
+ "single_word": false,
1100
+ "special": false
1101
+ },
1102
+ "32134": {
1103
+ "content": "<y32>",
1104
+ "lstrip": false,
1105
+ "normalized": true,
1106
+ "rstrip": false,
1107
+ "single_word": false,
1108
+ "special": false
1109
+ },
1110
+ "32135": {
1111
+ "content": "<y33>",
1112
+ "lstrip": false,
1113
+ "normalized": true,
1114
+ "rstrip": false,
1115
+ "single_word": false,
1116
+ "special": false
1117
+ },
1118
+ "32136": {
1119
+ "content": "<y34>",
1120
+ "lstrip": false,
1121
+ "normalized": true,
1122
+ "rstrip": false,
1123
+ "single_word": false,
1124
+ "special": false
1125
+ },
1126
+ "32137": {
1127
+ "content": "<y35>",
1128
+ "lstrip": false,
1129
+ "normalized": true,
1130
+ "rstrip": false,
1131
+ "single_word": false,
1132
+ "special": false
1133
+ },
1134
+ "32138": {
1135
+ "content": "<y36>",
1136
+ "lstrip": false,
1137
+ "normalized": true,
1138
+ "rstrip": false,
1139
+ "single_word": false,
1140
+ "special": false
1141
+ },
1142
+ "32139": {
1143
+ "content": "<y37>",
1144
+ "lstrip": false,
1145
+ "normalized": true,
1146
+ "rstrip": false,
1147
+ "single_word": false,
1148
+ "special": false
1149
+ },
1150
+ "32140": {
1151
+ "content": "<y38>",
1152
+ "lstrip": false,
1153
+ "normalized": true,
1154
+ "rstrip": false,
1155
+ "single_word": false,
1156
+ "special": false
1157
+ },
1158
+ "32141": {
1159
+ "content": "<y39>",
1160
+ "lstrip": false,
1161
+ "normalized": true,
1162
+ "rstrip": false,
1163
+ "single_word": false,
1164
+ "special": false
1165
+ },
1166
+ "32142": {
1167
+ "content": "<y40>",
1168
+ "lstrip": false,
1169
+ "normalized": true,
1170
+ "rstrip": false,
1171
+ "single_word": false,
1172
+ "special": false
1173
+ },
1174
+ "32143": {
1175
+ "content": "<y41>",
1176
+ "lstrip": false,
1177
+ "normalized": true,
1178
+ "rstrip": false,
1179
+ "single_word": false,
1180
+ "special": false
1181
+ },
1182
+ "32144": {
1183
+ "content": "<y42>",
1184
+ "lstrip": false,
1185
+ "normalized": true,
1186
+ "rstrip": false,
1187
+ "single_word": false,
1188
+ "special": false
1189
+ },
1190
+ "32145": {
1191
+ "content": "<y43>",
1192
+ "lstrip": false,
1193
+ "normalized": true,
1194
+ "rstrip": false,
1195
+ "single_word": false,
1196
+ "special": false
1197
+ },
1198
+ "32146": {
1199
+ "content": "<y44>",
1200
+ "lstrip": false,
1201
+ "normalized": true,
1202
+ "rstrip": false,
1203
+ "single_word": false,
1204
+ "special": false
1205
+ },
1206
+ "32147": {
1207
+ "content": "<y45>",
1208
+ "lstrip": false,
1209
+ "normalized": true,
1210
+ "rstrip": false,
1211
+ "single_word": false,
1212
+ "special": false
1213
+ },
1214
+ "32148": {
1215
+ "content": "<y46>",
1216
+ "lstrip": false,
1217
+ "normalized": true,
1218
+ "rstrip": false,
1219
+ "single_word": false,
1220
+ "special": false
1221
+ },
1222
+ "32149": {
1223
+ "content": "<y47>",
1224
+ "lstrip": false,
1225
+ "normalized": true,
1226
+ "rstrip": false,
1227
+ "single_word": false,
1228
+ "special": false
1229
+ },
1230
+ "32150": {
1231
+ "content": "<y48>",
1232
+ "lstrip": false,
1233
+ "normalized": true,
1234
+ "rstrip": false,
1235
+ "single_word": false,
1236
+ "special": false
1237
+ },
1238
+ "32151": {
1239
+ "content": "<y49>",
1240
+ "lstrip": false,
1241
+ "normalized": true,
1242
+ "rstrip": false,
1243
+ "single_word": false,
1244
+ "special": false
1245
+ },
1246
+ "32152": {
1247
+ "content": "<y50>",
1248
+ "lstrip": false,
1249
+ "normalized": true,
1250
+ "rstrip": false,
1251
+ "single_word": false,
1252
+ "special": false
1253
+ },
1254
+ "32153": {
1255
+ "content": "<y51>",
1256
+ "lstrip": false,
1257
+ "normalized": true,
1258
+ "rstrip": false,
1259
+ "single_word": false,
1260
+ "special": false
1261
+ },
1262
+ "32154": {
1263
+ "content": "<y52>",
1264
+ "lstrip": false,
1265
+ "normalized": true,
1266
+ "rstrip": false,
1267
+ "single_word": false,
1268
+ "special": false
1269
+ },
1270
+ "32155": {
1271
+ "content": "<y53>",
1272
+ "lstrip": false,
1273
+ "normalized": true,
1274
+ "rstrip": false,
1275
+ "single_word": false,
1276
+ "special": false
1277
+ },
1278
+ "32156": {
1279
+ "content": "<y54>",
1280
+ "lstrip": false,
1281
+ "normalized": true,
1282
+ "rstrip": false,
1283
+ "single_word": false,
1284
+ "special": false
1285
+ },
1286
+ "32157": {
1287
+ "content": "<y55>",
1288
+ "lstrip": false,
1289
+ "normalized": true,
1290
+ "rstrip": false,
1291
+ "single_word": false,
1292
+ "special": false
1293
+ },
1294
+ "32158": {
1295
+ "content": "<y56>",
1296
+ "lstrip": false,
1297
+ "normalized": true,
1298
+ "rstrip": false,
1299
+ "single_word": false,
1300
+ "special": false
1301
+ },
1302
+ "32159": {
1303
+ "content": "<y57>",
1304
+ "lstrip": false,
1305
+ "normalized": true,
1306
+ "rstrip": false,
1307
+ "single_word": false,
1308
+ "special": false
1309
+ },
1310
+ "32160": {
1311
+ "content": "<y58>",
1312
+ "lstrip": false,
1313
+ "normalized": true,
1314
+ "rstrip": false,
1315
+ "single_word": false,
1316
+ "special": false
1317
+ },
1318
+ "32161": {
1319
+ "content": "<y59>",
1320
+ "lstrip": false,
1321
+ "normalized": true,
1322
+ "rstrip": false,
1323
+ "single_word": false,
1324
+ "special": false
1325
+ },
1326
+ "32162": {
1327
+ "content": "<y60>",
1328
+ "lstrip": false,
1329
+ "normalized": true,
1330
+ "rstrip": false,
1331
+ "single_word": false,
1332
+ "special": false
1333
+ },
1334
+ "32163": {
1335
+ "content": "<y61>",
1336
+ "lstrip": false,
1337
+ "normalized": true,
1338
+ "rstrip": false,
1339
+ "single_word": false,
1340
+ "special": false
1341
+ },
1342
+ "32164": {
1343
+ "content": "<y62>",
1344
+ "lstrip": false,
1345
+ "normalized": true,
1346
+ "rstrip": false,
1347
+ "single_word": false,
1348
+ "special": false
1349
+ },
1350
+ "32165": {
1351
+ "content": "<y63>",
1352
+ "lstrip": false,
1353
+ "normalized": true,
1354
+ "rstrip": false,
1355
+ "single_word": false,
1356
+ "special": false
1357
+ },
1358
+ "32166": {
1359
+ "content": "<y64>",
1360
+ "lstrip": false,
1361
+ "normalized": true,
1362
+ "rstrip": false,
1363
+ "single_word": false,
1364
+ "special": false
1365
+ },
1366
+ "32167": {
1367
+ "content": "<y65>",
1368
+ "lstrip": false,
1369
+ "normalized": true,
1370
+ "rstrip": false,
1371
+ "single_word": false,
1372
+ "special": false
1373
+ },
1374
+ "32168": {
1375
+ "content": "<y66>",
1376
+ "lstrip": false,
1377
+ "normalized": true,
1378
+ "rstrip": false,
1379
+ "single_word": false,
1380
+ "special": false
1381
+ },
1382
+ "32169": {
1383
+ "content": "<y67>",
1384
+ "lstrip": false,
1385
+ "normalized": true,
1386
+ "rstrip": false,
1387
+ "single_word": false,
1388
+ "special": false
1389
+ },
1390
+ "32170": {
1391
+ "content": "<y68>",
1392
+ "lstrip": false,
1393
+ "normalized": true,
1394
+ "rstrip": false,
1395
+ "single_word": false,
1396
+ "special": false
1397
+ },
1398
+ "32171": {
1399
+ "content": "<y69>",
1400
+ "lstrip": false,
1401
+ "normalized": true,
1402
+ "rstrip": false,
1403
+ "single_word": false,
1404
+ "special": false
1405
+ },
1406
+ "32172": {
1407
+ "content": "<y70>",
1408
+ "lstrip": false,
1409
+ "normalized": true,
1410
+ "rstrip": false,
1411
+ "single_word": false,
1412
+ "special": false
1413
+ },
1414
+ "32173": {
1415
+ "content": "<y71>",
1416
+ "lstrip": false,
1417
+ "normalized": true,
1418
+ "rstrip": false,
1419
+ "single_word": false,
1420
+ "special": false
1421
+ },
1422
+ "32174": {
1423
+ "content": "<y72>",
1424
+ "lstrip": false,
1425
+ "normalized": true,
1426
+ "rstrip": false,
1427
+ "single_word": false,
1428
+ "special": false
1429
+ },
1430
+ "32175": {
1431
+ "content": "<y73>",
1432
+ "lstrip": false,
1433
+ "normalized": true,
1434
+ "rstrip": false,
1435
+ "single_word": false,
1436
+ "special": false
1437
+ },
1438
+ "32176": {
1439
+ "content": "<y74>",
1440
+ "lstrip": false,
1441
+ "normalized": true,
1442
+ "rstrip": false,
1443
+ "single_word": false,
1444
+ "special": false
1445
+ },
1446
+ "32177": {
1447
+ "content": "<y75>",
1448
+ "lstrip": false,
1449
+ "normalized": true,
1450
+ "rstrip": false,
1451
+ "single_word": false,
1452
+ "special": false
1453
+ },
1454
+ "32178": {
1455
+ "content": "<y76>",
1456
+ "lstrip": false,
1457
+ "normalized": true,
1458
+ "rstrip": false,
1459
+ "single_word": false,
1460
+ "special": false
1461
+ },
1462
+ "32179": {
1463
+ "content": "<y77>",
1464
+ "lstrip": false,
1465
+ "normalized": true,
1466
+ "rstrip": false,
1467
+ "single_word": false,
1468
+ "special": false
1469
+ },
1470
+ "32180": {
1471
+ "content": "<y78>",
1472
+ "lstrip": false,
1473
+ "normalized": true,
1474
+ "rstrip": false,
1475
+ "single_word": false,
1476
+ "special": false
1477
+ },
1478
+ "32181": {
1479
+ "content": "<y79>",
1480
+ "lstrip": false,
1481
+ "normalized": true,
1482
+ "rstrip": false,
1483
+ "single_word": false,
1484
+ "special": false
1485
+ },
1486
+ "32182": {
1487
+ "content": "<y80>",
1488
+ "lstrip": false,
1489
+ "normalized": true,
1490
+ "rstrip": false,
1491
+ "single_word": false,
1492
+ "special": false
1493
+ },
1494
+ "32183": {
1495
+ "content": "<y81>",
1496
+ "lstrip": false,
1497
+ "normalized": true,
1498
+ "rstrip": false,
1499
+ "single_word": false,
1500
+ "special": false
1501
+ },
1502
+ "32184": {
1503
+ "content": "<y82>",
1504
+ "lstrip": false,
1505
+ "normalized": true,
1506
+ "rstrip": false,
1507
+ "single_word": false,
1508
+ "special": false
1509
+ },
1510
+ "32185": {
1511
+ "content": "<y83>",
1512
+ "lstrip": false,
1513
+ "normalized": true,
1514
+ "rstrip": false,
1515
+ "single_word": false,
1516
+ "special": false
1517
+ },
1518
+ "32186": {
1519
+ "content": "<y84>",
1520
+ "lstrip": false,
1521
+ "normalized": true,
1522
+ "rstrip": false,
1523
+ "single_word": false,
1524
+ "special": false
1525
+ },
1526
+ "32187": {
1527
+ "content": "<y85>",
1528
+ "lstrip": false,
1529
+ "normalized": true,
1530
+ "rstrip": false,
1531
+ "single_word": false,
1532
+ "special": false
1533
+ },
1534
+ "32188": {
1535
+ "content": "<y86>",
1536
+ "lstrip": false,
1537
+ "normalized": true,
1538
+ "rstrip": false,
1539
+ "single_word": false,
1540
+ "special": false
1541
+ },
1542
+ "32189": {
1543
+ "content": "<y87>",
1544
+ "lstrip": false,
1545
+ "normalized": true,
1546
+ "rstrip": false,
1547
+ "single_word": false,
1548
+ "special": false
1549
+ },
1550
+ "32190": {
1551
+ "content": "<y88>",
1552
+ "lstrip": false,
1553
+ "normalized": true,
1554
+ "rstrip": false,
1555
+ "single_word": false,
1556
+ "special": false
1557
+ },
1558
+ "32191": {
1559
+ "content": "<y89>",
1560
+ "lstrip": false,
1561
+ "normalized": true,
1562
+ "rstrip": false,
1563
+ "single_word": false,
1564
+ "special": false
1565
+ },
1566
+ "32192": {
1567
+ "content": "<y90>",
1568
+ "lstrip": false,
1569
+ "normalized": true,
1570
+ "rstrip": false,
1571
+ "single_word": false,
1572
+ "special": false
1573
+ },
1574
+ "32193": {
1575
+ "content": "<y91>",
1576
+ "lstrip": false,
1577
+ "normalized": true,
1578
+ "rstrip": false,
1579
+ "single_word": false,
1580
+ "special": false
1581
+ },
1582
+ "32194": {
1583
+ "content": "<y92>",
1584
+ "lstrip": false,
1585
+ "normalized": true,
1586
+ "rstrip": false,
1587
+ "single_word": false,
1588
+ "special": false
1589
+ },
1590
+ "32195": {
1591
+ "content": "<y93>",
1592
+ "lstrip": false,
1593
+ "normalized": true,
1594
+ "rstrip": false,
1595
+ "single_word": false,
1596
+ "special": false
1597
+ },
1598
+ "32196": {
1599
+ "content": "<y94>",
1600
+ "lstrip": false,
1601
+ "normalized": true,
1602
+ "rstrip": false,
1603
+ "single_word": false,
1604
+ "special": false
1605
+ },
1606
+ "32197": {
1607
+ "content": "<y95>",
1608
+ "lstrip": false,
1609
+ "normalized": true,
1610
+ "rstrip": false,
1611
+ "single_word": false,
1612
+ "special": false
1613
+ },
1614
+ "32198": {
1615
+ "content": "<y96>",
1616
+ "lstrip": false,
1617
+ "normalized": true,
1618
+ "rstrip": false,
1619
+ "single_word": false,
1620
+ "special": false
1621
+ },
1622
+ "32199": {
1623
+ "content": "<y97>",
1624
+ "lstrip": false,
1625
+ "normalized": true,
1626
+ "rstrip": false,
1627
+ "single_word": false,
1628
+ "special": false
1629
+ },
1630
+ "32200": {
1631
+ "content": "<y98>",
1632
+ "lstrip": false,
1633
+ "normalized": true,
1634
+ "rstrip": false,
1635
+ "single_word": false,
1636
+ "special": false
1637
+ },
1638
+ "32201": {
1639
+ "content": "<y99>",
1640
+ "lstrip": false,
1641
+ "normalized": true,
1642
+ "rstrip": false,
1643
+ "single_word": false,
1644
+ "special": false
1645
+ },
1646
+ "32202": {
1647
+ "content": "<box>",
1648
+ "lstrip": false,
1649
+ "normalized": true,
1650
+ "rstrip": false,
1651
+ "single_word": false,
1652
+ "special": false
1653
+ },
1654
+ "32203": {
1655
+ "content": "</box>",
1656
+ "lstrip": false,
1657
+ "normalized": true,
1658
+ "rstrip": false,
1659
+ "single_word": false,
1660
+ "special": false
1661
+ },
1662
+ "32204": {
1663
+ "content": "<image>",
1664
+ "lstrip": false,
1665
+ "normalized": true,
1666
+ "rstrip": false,
1667
+ "single_word": false,
1668
+ "special": false
1669
+ },
1670
+ "32205": {
1671
+ "content": "<prev_im>",
1672
+ "lstrip": false,
1673
+ "normalized": true,
1674
+ "rstrip": false,
1675
+ "single_word": false,
1676
+ "special": false
1677
+ },
1678
+ "32206": {
1679
+ "content": "<lat_image>",
1680
+ "lstrip": false,
1681
+ "normalized": true,
1682
+ "rstrip": false,
1683
+ "single_word": false,
1684
+ "special": false
1685
+ }
1686
+ },
1687
+ "bos_token": "<s>",
1688
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}You are an expert radiology assistant tasked with interpreting a chest X-ray study. {% for message in messages %}{% if message[\"role\"] == \"user\" %}USER: {% else %}ASSISTANT: {% endif %}{% for item in message[\"content\"] %}{% if item[\"type\"] == \"text\" %}{{ item[\"text\"] }}{% elif item[\"type\"] == \"image\" %}<image>{% endif %}{% endfor %}{% if message[\"role\"] == \"user\" %} {% else %}{{eos_token}}{% endif %}{% endfor %}{% if add_generation_prompt %}ASSISTANT: {% endif %}",
1689
+ "clean_up_tokenization_spaces": false,
1690
+ "eos_token": "</s>",
1691
+ "extra_special_tokens": {},
1692
+ "legacy": false,
1693
+ "model_max_length": 4096,
1694
+ "pad_token": "<unk>",
1695
+ "padding_side": "left",
1696
+ "processor_class": "Maira2Processor",
1697
+ "sp_model_kwargs": {},
1698
+ "spaces_between_special_tokens": false,
1699
+ "tokenizer_class": "LlamaTokenizer",
1700
+ "unk_token": "<unk>",
1701
+ "use_default_system_prompt": false
1702
+ }