Prince-1 commited on
Commit
b50ecc6
·
verified ·
1 Parent(s): 2f8f63d

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ tags:
4
+ - visual-document-understanding
5
+ - visual-question-answering
6
+ - indian-documents
7
+ license: apache-2.0
8
+ language:
9
+ - en
10
+ library_name: transformers
11
+ base_model:
12
+ - bharatgenai/patram-7b-instruct
13
+ ---
14
+
15
+ # Patram-7B-Instruct
16
+
17
+ Patram-7B-Instruct by BharatGen is a 7B parameter vision-language model trained from scratch for visual document understanding. As India’s first document foundation model, it is built to tackle complex document analysis.
18
+ The model was trained on a carefully curated instruction-tuned dataset, combining diverse public and custom synthetic data designed to support a broad spectrum of document understanding tasks.
19
+
20
+ ## Model Overview
21
+
22
+ * **Architecture:** Vision Transformer (ViT) + MLP projector + OLMo-7B LLM
23
+ * **Training Data:** BharatDocs-v1, a dataset of diverse Indian documents + Other Open Source Document Datasets
24
+ * **Supported I/O Formats:** The model currently accepts English-language instructions and image files (e.g., PNG, JPEG) as input. The output is provided in text format.
25
+ * **Language:** English (Indian language support upcoming)
26
+ * **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
27
+
28
+ ## Usage Examples
29
+
30
+ Use the `transformers` library.
31
+
32
+ ```python
33
+ import torch
34
+ from transformers import AutoProcessor, AutoModelForCausalLM, GenerationConfig
35
+ from PIL import Image
36
+ import requests
37
+
38
+ # Model ID and device setup
39
+ model_id = "bharatgenai/patram-7b-instruct"
40
+ device = "cuda" if torch.cuda.is_available() else "cpu"
41
+
42
+ # Load processor and model
43
+ processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
44
+ model = AutoModelForCausalLM.from_pretrained(
45
+ model_id,
46
+ trust_remote_code=True
47
+ ).to(device)
48
+
49
+ def get_patram_response(image_path_or_url, question):
50
+ try:
51
+ # Load image
52
+ if image_path_or_url.startswith("http"):
53
+ image = Image.open(requests.get(image_path_or_url, stream=True).raw).convert("RGB")
54
+ else:
55
+ image = Image.open(image_path_or_url).convert("RGB")
56
+ except Exception as e:
57
+ print(f"Error loading image: {e}")
58
+ return None
59
+
60
+ # Format the prompt as expected
61
+ prompt = f"Question: {question} Answer based on the image."
62
+
63
+ try:
64
+ # Preprocess image and text using the processor
65
+ inputs = processor.process(images=[image], text=prompt)
66
+ inputs = {k: v.to(device).unsqueeze(0) for k, v in inputs.items()}
67
+
68
+ # Generate output using model's generate_from_batch method (Patram-specific)
69
+ output = model.generate_from_batch(
70
+ inputs,
71
+ GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
72
+ tokenizer=processor.tokenizer
73
+ )
74
+
75
+ # Extract generated tokens (excluding input tokens) and decode
76
+ generated_tokens = output[0, inputs['input_ids'].size(1):]
77
+ response = processor.tokenizer.decode(generated_tokens, skip_special_tokens=True).strip()
78
+ return response
79
+ except Exception as e:
80
+ print(f"Error during inference: {e}")
81
+ return None
82
+
83
+ # Example usage:
84
+ # image_input = "https://knowscope.in/wp-content/uploads/2025/05/cghd-nag.png"
85
+ # question = "Who issued this notice?"
86
+ # answer = get_patram_response(image_input, question)
87
+ # if answer:
88
+ # print("Answer:", answer)
89
+ ```
90
+
91
+ **Note**: If you're trying this on an Apple Silicon (M1/M2/M3/M4/...) chip, please follow the official documentation by PyTorch and Hugging Face for installing dependencies:
92
+
93
+ - [PyTorch's official guide on installation (macOS)](https://pytorch.org/get-started/locally/#:~:text=torch%20torchvision%20torchaudio-,Installing%20on%20macOS,-PyTorch%20can%20be)
94
+ - [Hugging Face Transformers performance tips](https://huggingface.co/docs/transformers/main/en/perf_train_special)
95
+
96
+
97
+ ## Evaluations
98
+
99
+ We evaluated Patram-7B-Instruct alongside other vision-language models (VLMs) in the 7B–9B parameter range across multiple public document benchmarks.
100
+
101
+ **Benchmarks**: DocVQA, VisualMRC, Patram-Bench
102
+
103
+ Patram-Bench is an in-house benchmark designed for Indic Document VQA.
104
+
105
+ **Metric**: G-Eval (LLM-as-a-judge)
106
+
107
+ | Model | Overall | DocVQA | Patram-Bench | VisualMRC |
108
+ | ---------------------- | ------- | ------ | ------------ | --------- |
109
+ | claude-3.7-sonnet | 0.8830 | 0.8480 | 0.8857 | 0.8830 |
110
+ | Qwen2.5-VL-7B-Instruct | 0.8759 | 0.8722 | 0.6816 | 0.9169 |
111
+ | gemma-3-12b-it | 0.8556 | 0.8451 | 0.6349 | 0.9069 |
112
+ | **patram-7b-instruct** | 0.8331 | 0.8550 | 0.6515 | 0.8510 |
113
+ | InternVL3-9B | 0.7865 | 0.8681 | 0.6888 | 0.7405 |
114
+ | deepseek-vl2 | 0.7581 | 0.8739 | 0.5089 | 0.7144 |
115
+
116
+ *Note: The benchmarked results reflect the API variant.
117
+
118
+ ## Citation
119
+
120
+ ```bibtex
121
+ @online{BharatGenPatramLaunch2025,
122
+ author = {{BharatGen Team}},
123
+ title = {BharatGen Unveils Patram: India's Pioneering Vision-Language Foundation Model for Document Intelligence},
124
+ year = {2025},
125
+ url = {https://bharatgen.com/blog/patram-launch},
126
+ urldate = {2025-06-02}
127
+ }
128
+ ```
129
+
130
+ ## Resources
131
+
132
+ * **Model**: [huggingface.co/bharatgenai/patram-7b-instruct](https://huggingface.co/bharatgenai/patram-7b-instruct)
133
+ * **Project Page**: [bharatgen.com/patram](https://bharatgen.com/patram)
134
+ * **Blog**: [bharatgen.com/blog/patram-launch](https://bharatgen.com/blog/patram-launch)
135
+
136
+ ## Authors
137
+
138
+ * **Principal Investigators**: Prof. Ravi Kiran Sarvadevabhatla, Prof. Ganesh Ramakrishnan
139
+ * **Contributors**: BharatGen Team
140
+
141
+ ## Contact
142
+
143
+ * [Contact Form](https://bharatgen.com/contact)
144
+ * Hugging Face Community Tab
added_tokens.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "<im_col>": 100281,
3
+ "<im_end>": 100279,
4
+ "<im_patch>": 100280,
5
+ "<im_start>": 100278,
6
+ "<|image|>": 100282
7
+ }
chat_template.jinja ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {% for message in messages -%}
2
+ {%- if (loop.index % 2 == 1 and message['role'] != 'user') or
3
+ (loop.index % 2 == 0 and message['role'].lower() != 'assistant') -%}
4
+ {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
5
+ {%- endif -%}
6
+ {{ message['role'].capitalize() + ': ' + message['content'] }}
7
+ {%- if not loop.last -%}
8
+ {{ ' ' }}
9
+ {%- endif %}
10
+ {%- endfor -%}
11
+ {%- if add_generation_prompt -%}
12
+ {{ ' Assistant:' }}
13
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "PatramForCausalLM"
4
+ ],
5
+ "attention_layer_norm": true,
6
+ "auto_map": {
7
+ "AutoConfig": "config_patram.PatramConfig",
8
+ "AutoModelForCausalLM": "modeling_patram.PatramForCausalLM"
9
+ },
10
+ "clip_qkv": null,
11
+ "embedding_size": 100352,
12
+ "hidden_size": 4096,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 22016,
15
+ "layer_norm_eps": 1e-06,
16
+ "layer_norm_type": "rms",
17
+ "max_position_embeddings": 4096,
18
+ "model_type": "patram",
19
+ "norm_after": true,
20
+ "num_attention_heads": 32,
21
+ "num_hidden_layers": 32,
22
+ "num_key_value_heads": null,
23
+ "pad_token_id": 100277,
24
+ "qkv_bias": false,
25
+ "rope_theta": 500000.0,
26
+ "tie_word_embeddings": false,
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.52.3",
29
+ "use_cache": false,
30
+ "use_position_ids": true,
31
+ "vocab_size": 100278,
32
+ "weight_tying": false
33
+ }
config_patram.py ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List
2
+
3
+ from transformers import PretrainedConfig, AutoTokenizer
4
+
5
+
6
+ class PatramConfig(PretrainedConfig):
7
+ model_type = "patram"
8
+ keys_to_ignore_at_inference = ["past_key_values"]
9
+
10
+ def __init__(
11
+ self,
12
+ vocab_size=50304,
13
+ embedding_size=50304,
14
+ hidden_size=4096,
15
+ intermediate_size=11008,
16
+ num_hidden_layers=32,
17
+ num_attention_heads=32,
18
+ num_key_value_heads=None,
19
+ max_position_embeddings=2048,
20
+ initializer_range=0.02,
21
+ use_cache=True,
22
+ layer_norm_eps: float = 1e-5,
23
+ rope_theta=10000.0,
24
+ clip_qkv=None,
25
+ qkv_bias: bool = False,
26
+ weight_tying: bool = False,
27
+ use_position_ids: bool=True,
28
+ tie_word_embeddings: bool=True,
29
+ attention_layer_norm: bool=False,
30
+ norm_after: bool = False,
31
+ layer_norm_type: str="rms",
32
+ **kwargs,
33
+ ):
34
+ self.vocab_size = vocab_size
35
+ self.embedding_size = embedding_size
36
+ self.max_position_embeddings = max_position_embeddings
37
+ self.hidden_size = hidden_size
38
+ self.intermediate_size = intermediate_size
39
+ self.num_hidden_layers = num_hidden_layers
40
+ self.num_attention_heads = num_attention_heads
41
+ self.layer_norm_eps = layer_norm_eps
42
+ self.weight_tying = weight_tying
43
+ self.use_position_ids = use_position_ids
44
+ self.attention_layer_norm = attention_layer_norm
45
+ self.num_key_value_heads = num_key_value_heads
46
+ self.initializer_range = initializer_range
47
+ self.use_cache = use_cache
48
+ self.rope_theta = rope_theta
49
+ self.clip_qkv = clip_qkv
50
+ self.qkv_bias = qkv_bias
51
+ self.norm_after = norm_after
52
+ self.tie_word_embeddings = tie_word_embeddings
53
+ self.layer_norm_type = layer_norm_type
54
+
55
+ super().__init__(
56
+ tie_word_embeddings=tie_word_embeddings,
57
+ **kwargs,
58
+ )
59
+
60
+ PatramConfig.register_for_auto_class()
generation_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "transformers_version": "4.52.3"
4
+ }
image_preprocessing_patram.py ADDED
@@ -0,0 +1,546 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Image processor class for Patram"""
2
+ from typing import List, Optional, Union, Mapping
3
+
4
+ import numpy as np
5
+ import einops
6
+ import torch
7
+ import torchvision.transforms
8
+ from torchvision.transforms import InterpolationMode
9
+ from torchvision.transforms.functional import convert_image_dtype
10
+
11
+ from transformers.image_utils import (
12
+ OPENAI_CLIP_MEAN,
13
+ OPENAI_CLIP_STD,
14
+ ImageInput,
15
+ is_valid_image,
16
+ )
17
+ from transformers.processing_utils import ImagesKwargs
18
+ from transformers.image_processing_utils import BaseImageProcessor
19
+ from transformers.utils import logging
20
+
21
+
22
+ logger = logging.get_logger(__name__)
23
+
24
+
25
+ def pad_to_bounding_box(
26
+ image, offset_height, offset_width, target_height,
27
+ target_width, value=0
28
+ ):
29
+ height, width = image.shape[:2]
30
+ after_padding_width = target_width - offset_width - width
31
+ after_padding_height = target_height - offset_height - height
32
+ return np.pad(image, [
33
+ [offset_height, after_padding_height],
34
+ [offset_width, after_padding_width],
35
+ [0, 0]
36
+ ], constant_values=value)
37
+
38
+
39
+ def normalize_image(image, offset, scale):
40
+ image -= np.array(offset, dtype=np.float32)[None, None, :]
41
+ image /= np.array(scale, dtype=np.float32)[None, None, :]
42
+ return image
43
+
44
+
45
+ def resize_and_pad(
46
+ image,
47
+ desired_output_size,
48
+ resize_method="torch-bilinear",
49
+ pad_value=0,
50
+ normalize=True,
51
+ image_mean=OPENAI_CLIP_MEAN,
52
+ image_std=OPENAI_CLIP_STD,
53
+ ):
54
+ desired_height, desired_width = desired_output_size
55
+ height, width = image.shape[:2]
56
+
57
+ # Cast into float32 since the training code did this in float32 and it (very rarely) effects
58
+ # the results after rounding.
59
+ image_scale_y = np.array(desired_height, np.float32) / np.array(height, np.float32)
60
+ image_scale_x = np.array(desired_width, np.float32) / np.array(width, np.float32)
61
+ image_scale = min(image_scale_x, image_scale_y)
62
+ scaled_height = int(np.array(height, np.float32) * image_scale)
63
+ scaled_width = int(np.array(width, np.float32) * image_scale)
64
+
65
+ if resize_method == "tensorflow":
66
+ # This how the original training code did resizing, it can produce slightly different
67
+ # results then using torch resize so we keep it just in case
68
+ import tensorflow as tf
69
+ image = tf.image.convert_image_dtype(tf.constant(image), dtype=tf.float32)
70
+ image = tf.image.resize(
71
+ image,
72
+ [scaled_height, scaled_width],
73
+ method=tf.image.ResizeMethod.BILINEAR,
74
+ antialias=True,
75
+ )
76
+ image = tf.clip_by_value(image, 0.0, 1.0)
77
+ image = image.numpy()
78
+ elif resize_method == "torch-bilinear":
79
+ image = torch.permute(torch.from_numpy(image), [2, 0, 1])
80
+ image = convert_image_dtype(image) # resize in float32 to match the training code
81
+ image = torchvision.transforms.Resize(
82
+ [scaled_height, scaled_width], InterpolationMode.BILINEAR, antialias=True
83
+ )(image)
84
+ image = torch.clip(image, 0.0, 1.0)
85
+ image = torch.permute(image, [1, 2, 0]).numpy()
86
+ else:
87
+ raise NotImplementedError(resize_method)
88
+
89
+ top_pad = (desired_height - scaled_height) // 2
90
+ left_pad = (desired_width - scaled_width) // 2
91
+ padding = [
92
+ [top_pad, desired_height - scaled_height - top_pad],
93
+ [left_pad, desired_width - scaled_width - left_pad],
94
+ [0, 0]
95
+ ]
96
+ image_mask = np.pad(np.ones_like(image[:, :, 0], dtype=bool), padding[:2])
97
+ image = np.pad(image, padding, constant_values=pad_value)
98
+ if normalize:
99
+ image = normalize_image(image, offset=image_mean, scale=image_std)
100
+ return image, image_mask
101
+
102
+
103
+ def select_tiling(h, w, patch_size, max_num_patches):
104
+ """Decide how best to divide in image of size [w, h] in up to max_num_patches of size patch_size"""
105
+ original_size = np.stack([h, w]) # [1, 2]
106
+ original_res = h * w
107
+ tilings = []
108
+ for i in range(1, max_num_patches+1):
109
+ for j in range(1, max_num_patches+1):
110
+ if i*j <= max_num_patches:
111
+ tilings.append((i, j))
112
+ # sort so argmin and argmax favour smaller tilings in the event of a tie
113
+ tilings.sort(key=lambda x: (x[0]*x[1], x[0]))
114
+ candidate_tilings = np.array(tilings, dtype=np.int32) # [n_resolutions, 2]
115
+ candidate_resolutions = candidate_tilings * patch_size # [n_resolutions, 2]
116
+
117
+ # How much we would need to scale the image to fit exactly in each tiling
118
+ original_size = np.stack([h, w], dtype=np.float32) # [1, 2]
119
+ required_scale_d = candidate_resolutions.astype(np.float32) / original_size
120
+ required_scale = np.min(required_scale_d, axis=-1, keepdims=True) # [n_resolutions, 1]
121
+ if np.all(required_scale < 1):
122
+ # We are forced to downscale, so try to minimize the amount of downscaling
123
+ ix = np.argmax(required_scale)
124
+ else:
125
+ # Pick the resolution that required the least upscaling so that it most closely fits the image
126
+ required_scale = np.where(required_scale < 1.0, 10e9, required_scale)
127
+ ix = np.argmin(required_scale)
128
+ return candidate_tilings[ix]
129
+
130
+
131
+ class PatramImagesKwargs(ImagesKwargs, total=False):
132
+ max_crops: Optional[int]
133
+ overlap_margins: Optional[List[int]]
134
+ base_image_input_size: Optional[List[int]]
135
+ image_token_length_w: Optional[int]
136
+ image_token_length_h: Optional[int]
137
+ image_patch_size: Optional[int]
138
+ image_padding_mask: Optional[bool]
139
+
140
+
141
+ class PatramImageProcessor(BaseImageProcessor):
142
+ """Preprocess images and multi-model inputs"""
143
+
144
+ def __init__(
145
+ self,
146
+ max_crops: int = 12,
147
+ overlap_margins: List[int] = (4, 4),
148
+ base_image_input_size: List[int] = (336, 336),
149
+ image_token_length_w: int = 12,
150
+ image_token_length_h: int = 12,
151
+ image_patch_size: int = 14,
152
+ image_padding_mask: bool = True,
153
+ do_normalize: bool = True,
154
+ image_mean: Optional[Union[float, List[float]]] = None,
155
+ image_std: Optional[Union[float, List[float]]] = None,
156
+ **kwargs,
157
+ ):
158
+ super().__init__(**kwargs)
159
+ self.max_crops = max_crops
160
+ self.overlap_margins = overlap_margins
161
+ self.base_image_input_size = base_image_input_size
162
+ self.image_token_length_w = image_token_length_w
163
+ self.image_token_length_h = image_token_length_h
164
+ self.image_patch_size = image_patch_size
165
+ self.image_padding_mask = image_padding_mask
166
+ self.do_normalize = do_normalize
167
+ self.image_mean = image_mean if image_mean is not None else OPENAI_CLIP_MEAN
168
+ self.image_std = image_std if image_std is not None else OPENAI_CLIP_STD
169
+
170
+ def image_to_patches_and_tokens(
171
+ self,
172
+ image: ImageInput,
173
+ image_patch_token_id: int,
174
+ image_col_token_id: int,
175
+ image_start_token_id: int,
176
+ image_end_token_id: int,
177
+ max_crops: Optional[int] = None,
178
+ overlap_margins: Optional[List[int]] = None,
179
+ base_image_input_size: Optional[Union[int, List[int]]] = None,
180
+ image_token_length_w: Optional[int] = None,
181
+ image_token_length_h: Optional[int] = None,
182
+ image_patch_size: Optional[int] = None,
183
+ ):
184
+ if isinstance(base_image_input_size, int):
185
+ base_image_input_size = (base_image_input_size, base_image_input_size)
186
+
187
+ base_image_input_d = image_patch_size
188
+ tokens_per_image = image_token_length_w * image_token_length_h
189
+ image_base_patch_w = base_image_input_size[1] // base_image_input_d
190
+ image_base_patch_h = base_image_input_size[0] // base_image_input_d
191
+
192
+ original_image_h, original_image_w = image.shape[:2]
193
+ crop_size = base_image_input_size[0]
194
+
195
+ # Discard this many patches from the (left/top, right/bottom) of crops
196
+ left_margin, right_margin = overlap_margins
197
+ # left_margin, right_margin = 2, 2
198
+ assert left_margin % 2 == 0 # Required for compatibility with 2x2 pooling
199
+ total_margin_pixels = base_image_input_d*(right_margin + left_margin) # pixels removed per dim
200
+ crop_patches = base_image_input_size[0] // base_image_input_d # patches per crop dim
201
+ crop_window_patches = crop_patches - (right_margin + left_margin) # usable patches
202
+ crop_window_size = crop_window_patches * base_image_input_d
203
+ tiling = select_tiling(
204
+ original_image_h - total_margin_pixels,
205
+ original_image_w - total_margin_pixels,
206
+ crop_window_size,
207
+ max_crops
208
+ )
209
+ src, img_mask = resize_and_pad(
210
+ image,
211
+ [tiling[0]*crop_window_size+total_margin_pixels, tiling[1]*crop_window_size+total_margin_pixels]
212
+ )
213
+
214
+ # Now we have to split the image into crops, while keeping track of how each patch in the
215
+ # each crop should be ordered in the global image, this require a lot of tricky booking
216
+ n_crops = tiling[0] * tiling[1]
217
+ patches_arr = []
218
+ mask_arr = []
219
+ patch_ordering_arr = []
220
+
221
+ # We assume 2x2 pooling, but can allow padding the right/bottom with extra
222
+ # patches if the number of patches per side is not even
223
+ assert (crop_patches+1)//2 == image_token_length_h
224
+ assert (crop_patches+1)//2 == image_token_length_w
225
+ on = 0
226
+ on_patch = 0
227
+ for i in range(tiling[0]):
228
+ y0 = i*crop_window_size
229
+ if i == 0:
230
+ crop_y0 = 0
231
+ else:
232
+ crop_y0 = left_margin // 2
233
+
234
+ crop_h = image_base_patch_h - (right_margin + left_margin)
235
+ if i == 0:
236
+ crop_h += left_margin
237
+ if i == (tiling[0]-1):
238
+ crop_h += right_margin
239
+ for j in range(tiling[1]):
240
+ x0 = j*crop_window_size
241
+ if j == 0:
242
+ crop_x0 = 0
243
+ else:
244
+ crop_x0 = left_margin // 2
245
+
246
+ crop_w = image_base_patch_w - (right_margin + left_margin)
247
+ if j == 0:
248
+ crop_w += left_margin
249
+ if j == (tiling[1]-1):
250
+ crop_w += right_margin
251
+
252
+ pooled_w = (crop_w + 1) // 2
253
+ pooled_h = (crop_h + 1) // 2
254
+ patch_ordering_arr.append(
255
+ pad_to_bounding_box(
256
+ np.reshape(np.arange(on, on+pooled_h*pooled_w, dtype=np.int32), (pooled_h, pooled_w, 1)),
257
+ crop_y0, crop_x0, image_token_length_h, image_token_length_w, value=-1
258
+ )[:, :, 0]
259
+ )
260
+ patches_arr.append(src[y0:y0+crop_size, x0:x0+crop_size])
261
+ mask_arr.append(img_mask[y0:y0+crop_size, x0:x0+crop_size])
262
+
263
+ on += pooled_h*pooled_w
264
+ on_patch += 1
265
+ patches = np.stack(patches_arr)
266
+ patch_ordering = np.stack(patch_ordering_arr)
267
+ img_mask = np.stack(mask_arr)
268
+
269
+ # Switch to [n_crops, n_patches, pixels_per_patch] format
270
+ image_layout_impatch_w, image_layout_impatch_h = tiling[0], tiling[1]
271
+ patches = einops.rearrange(
272
+ patches, 'p (h dh) (w dw) c -> p (h w) (dh dw c)',
273
+ dh=base_image_input_d,
274
+ dw=base_image_input_d,
275
+ h=image_base_patch_h,
276
+ w=image_base_patch_w
277
+ )
278
+ img_mask = einops.rearrange(
279
+ img_mask, 'p (h dh) (w dw) -> p (h w) (dh dw)',
280
+ dh=base_image_input_d,
281
+ dw=base_image_input_d,
282
+ h=image_base_patch_h,
283
+ w=image_base_patch_w
284
+ )
285
+
286
+ img_mask = img_mask.astype(np.float32).mean(axis=-1)
287
+ patch_ordering = np.reshape(patch_ordering, [-1])
288
+ valid = patch_ordering >= 0
289
+
290
+ # Transpose order, to get left-to-right order instead of crop-by-crop order
291
+ patch_ordering_rh = np.reshape(
292
+ patch_ordering,
293
+ [tiling[0], tiling[1], image_token_length_h, image_token_length_w]
294
+ )
295
+ patch_ordering_rh = np.transpose(patch_ordering_rh, [0, 2, 1, 3])
296
+ patch_ordering_rh = np.reshape(patch_ordering_rh, [-1])
297
+
298
+ # The transpose will screw up which patches are masked, project the
299
+ # new order into sparse structure of `patch_ordering` to fix this
300
+ patch_ordering[valid] = patch_ordering_rh[patch_ordering_rh >= 0]
301
+
302
+ # Now build the output tokens
303
+ h = tiling[0] * crop_window_patches + (right_margin+left_margin)
304
+ w = tiling[1] * crop_window_patches + (right_margin+left_margin)
305
+ per_row = np.full(
306
+ ((w+1)//2,),
307
+ image_patch_token_id,
308
+ )
309
+ per_row = np.concatenate([per_row, [image_col_token_id]], 0)
310
+
311
+ joint = np.tile(per_row, [(h+1)//2])
312
+ joint = [
313
+ [image_start_token_id],
314
+ joint,
315
+ [image_end_token_id]
316
+ ]
317
+
318
+ # Finally do the same for the global image
319
+ resized, _ = resize_and_pad(image, base_image_input_size)
320
+ resized = einops.rearrange(
321
+ resized, '(h dh) (w dw) c -> (h w) (dh dw c)',
322
+ dh=base_image_input_d,
323
+ dw=base_image_input_d,
324
+ h=image_base_patch_h,
325
+ w=image_base_patch_w
326
+ )
327
+ patches = np.concatenate([np.expand_dims(resized, 0), patches], 0)
328
+
329
+ # Global image goes first, so the order of patches in previous crops gets increased
330
+ patch_ordering = np.where(
331
+ patch_ordering >= 0,
332
+ patch_ordering + tokens_per_image,
333
+ -1
334
+ )
335
+ patch_ordering = np.concatenate([np.arange(0, tokens_per_image), patch_ordering], 0)
336
+ per_row = np.full(
337
+ (image_token_length_w,),
338
+ image_patch_token_id,
339
+ )
340
+ per_row = np.concatenate([per_row, [image_col_token_id]], 0)
341
+ extra_tokens = np.tile(per_row, [image_token_length_h])
342
+ joint = [
343
+ [image_start_token_id],
344
+ extra_tokens,
345
+ [image_end_token_id],
346
+ ] + joint
347
+
348
+ joint = np.concatenate(joint, 0)
349
+ img_mask = np.pad(img_mask, [[0, 1], [0, 0]], constant_values=-1)
350
+ return patches, joint, patch_ordering, img_mask
351
+
352
+ def build_image_input_idx(
353
+ self,
354
+ image_tokens: np.ndarray,
355
+ patch_order: np.ndarray,
356
+ image_patch_token_id: int,
357
+ no_image: Optional[bool] = None,
358
+ image_token_length_w: Optional[int] = None,
359
+ image_token_length_h: Optional[int] = None,
360
+ ):
361
+ """Converts `patch_order` into a mapping of token_id -> patch_id"""
362
+
363
+ tokens_per_image = image_token_length_w * image_token_length_h
364
+ if no_image is not None and no_image:
365
+ return np.zeros((0, tokens_per_image), np.int32)
366
+
367
+ # Indices to insert the patches
368
+ image_input_idx = image_tokens == image_patch_token_id
369
+ image_input_idx = np.nonzero(image_input_idx)[0].astype(np.int32)
370
+
371
+ if patch_order is not None:
372
+ n_tokens = image_input_idx.shape[0]
373
+ patch_order = np.reshape(patch_order, [-1])
374
+ n_patches = patch_order.shape[0]
375
+
376
+ valid = patch_order >= 0
377
+ n_valid_patches = valid.sum()
378
+ assert len(image_input_idx) == n_valid_patches
379
+
380
+ sorted_patch_ixs = np.zeros([n_tokens], np.int32)
381
+ sorted_patch_ixs[patch_order[valid]] = np.arange(n_valid_patches, dtype=np.int32)
382
+
383
+ # Project the inverted mapping into same sparse structure
384
+ sorted_patch_ixs_ex = np.full(np.shape(patch_order), -1)
385
+ sorted_patch_ixs_ex[valid] = sorted_patch_ixs
386
+
387
+ # Do the gather and then re-masked outputs that were masked in `sorted_patch_ixs`
388
+ valid = (sorted_patch_ixs_ex >= 0).astype(np.int32)
389
+ image_input_idx = image_input_idx[sorted_patch_ixs_ex*valid]
390
+ image_input_idx = image_input_idx*valid - 100*(1 - valid)
391
+ image_input_idx = np.reshape(image_input_idx, [-1, tokens_per_image])
392
+ return image_input_idx
393
+
394
+ def preprocess(
395
+ self,
396
+ image: np.ndarray,
397
+ image_patch_token_id: int,
398
+ image_col_token_id: int,
399
+ image_start_token_id: int,
400
+ image_end_token_id: int,
401
+ max_crops: Optional[int] = None,
402
+ overlap_margins: Optional[List[int]] = None,
403
+ base_image_input_size: Optional[Union[int, List[int]]] = None,
404
+ image_token_length_w: Optional[int] = None,
405
+ image_token_length_h: Optional[int] = None,
406
+ image_patch_size: Optional[int] = None,
407
+ **kwargs,
408
+ ):
409
+ """Preprocesses an image
410
+
411
+ Returns:
412
+ crops: (n_crops, n_patches, patch_dim) individual crops, `n_crops` might
413
+ change between images but the other dimension are fixed
414
+ tokens: (n_tokens,) int32 tokens, pad tokens indicate where to insert the
415
+ patch features, might include other special tokens as well
416
+ image_idx: (n_crops, n_patches) index in `tokens` to put the patch features from the
417
+ crops after pooling, negative values indicates patches features to exclude
418
+ padding_mask: (n_crops, n_patches) what percent of each crop is padding, can be None
419
+ if the image mask is not being used.
420
+ """
421
+
422
+ max_crops = max_crops or self.max_crops
423
+ overlap_margins = overlap_margins or self.overlap_margins
424
+ base_image_input_size = base_image_input_size or self.base_image_input_size
425
+ image_token_length_w = image_token_length_w or self.image_token_length_w
426
+ image_token_length_h = image_token_length_h or self.image_token_length_h
427
+ image_patch_size = image_patch_size or self.image_patch_size
428
+
429
+ crops, image_tokens, patch_ordering, img_mask = self.image_to_patches_and_tokens(
430
+ image,
431
+ image_patch_token_id,
432
+ image_col_token_id,
433
+ image_start_token_id,
434
+ image_end_token_id,
435
+ max_crops,
436
+ overlap_margins,
437
+ base_image_input_size,
438
+ image_token_length_w,
439
+ image_token_length_h,
440
+ image_patch_size,
441
+ )
442
+ patch_idx = self.build_image_input_idx(
443
+ image_tokens,
444
+ patch_ordering,
445
+ image_patch_token_id,
446
+ image_token_length_w=image_token_length_w,
447
+ image_token_length_h=image_token_length_h,
448
+ )
449
+ return crops, image_tokens, patch_idx, img_mask
450
+
451
+ def multimodal_preprocess(
452
+ self,
453
+ images: np.ndarray,
454
+ tokens: List[int],
455
+ image_idx: np.ndarray,
456
+ sequence_length: int,
457
+ image_patch_token_id: int,
458
+ image_col_token_id: int,
459
+ image_start_token_id: int,
460
+ image_end_token_id: int,
461
+ **kwargs,
462
+ ):
463
+ """Merge images and text tokens into multi-modal features for the model
464
+
465
+ :param images: images to use as input
466
+ :param tokens: input text tokens
467
+ :param image_idx: where to insert the images into `tokens`
468
+ :params image_patch_token_id: id to use of tokens that will contain image features
469
+ :params image_col_token_id: token id for image column special tokens
470
+ :params image_start_token_id: token id for image start special tokens
471
+ :params image_end_token_id: token id for image end special tokens
472
+ :params kwargs: override preprocessor default args
473
+ """
474
+ max_total_crops = kwargs.get("max_crops") or self.max_crops
475
+ image_token_length_w = kwargs.get("image_token_length_w") or self.image_token_length_w
476
+ image_token_length_h = kwargs.get("image_token_length_h") or self.image_token_length_h
477
+ image_patch_size = kwargs.get("image_patch_size") or self.image_patch_size
478
+ base_image_input_size = kwargs.get("base_image_input_size") or self.base_image_input_size
479
+ image_num_patch = (
480
+ base_image_input_size[0] // image_patch_size,
481
+ base_image_input_size[1] // image_patch_size,
482
+ )
483
+ image_padding_mask = kwargs.get("image_padding_mask") or self.image_padding_mask
484
+
485
+ tokens_per_image = image_token_length_w * image_token_length_h
486
+ n_pixels = image_patch_size * image_patch_size * 3
487
+ n_patches = image_num_patch[0] * image_num_patch[1]
488
+
489
+ if images is None:
490
+ return {
491
+ "input_ids": tokens,
492
+ }
493
+ else:
494
+ n = len(images)
495
+ all_crops = []
496
+ all_image_idx = []
497
+ out_tokens = []
498
+ all_crop_masks = []
499
+
500
+ for ix in range(n):
501
+ token_ix = image_idx[ix]
502
+ crops, image_tokens, patch_idx, img_mask = self.preprocess(
503
+ images[ix],
504
+ image_patch_token_id,
505
+ image_col_token_id,
506
+ image_start_token_id,
507
+ image_end_token_id,
508
+ **kwargs,
509
+ )
510
+
511
+ if token_ix == -1: # -1 is an image inserted at the very start
512
+ start = 0
513
+ token_ix = 0
514
+ end = 0
515
+ else:
516
+ start = 0 if ix == 0 else image_idx[ix-1] + 1
517
+ end = token_ix + 1
518
+
519
+ all_image_idx.append(patch_idx + token_ix)
520
+ all_crops.append(crops)
521
+ out_tokens.append(tokens[start:token_ix])
522
+ out_tokens.append(image_tokens)
523
+ if ix == (n - 1):
524
+ out_tokens.append(tokens[end:])
525
+ if image_padding_mask:
526
+ all_crop_masks.append(img_mask)
527
+
528
+ input_ids = np.concatenate(out_tokens, 0)
529
+ images = np.concatenate(all_crops, 0)
530
+ image_input_idx = np.concatenate(all_image_idx, 0)
531
+ if image_padding_mask:
532
+ image_masks = np.concatenate(all_crop_masks, 0)
533
+ else:
534
+ image_masks = None
535
+
536
+ out = {
537
+ "input_ids": input_ids,
538
+ "images": images,
539
+ "image_input_idx": image_input_idx
540
+ }
541
+ if image_masks is not None:
542
+ out["image_masks"] = image_masks
543
+ return out
544
+
545
+
546
+ PatramImageProcessor.register_for_auto_class()
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:361943ee8b0098606e0b3b3b6ff8237be8eb315bd12ac8e9ae5b3e58cb6f2b8a
3
+ size 4951691600
model-00002-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7904c5b975043a90af0162611653d672e27fc772014baef526f29aff9f2b84d1
3
+ size 4857402880
model-00003-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad29ec49f9f2e12ef1f6a8e4fa6a6386645f5bfb7c20ded25b3ed988b9f8761b
3
+ size 4857402936
model-00004-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e4deb1217465b785c1b936fccba8752830f481950b630ca70d8bd4a53072ea7
3
+ size 4857402936
model-00005-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3761ceab9e51fe4f625750506ce77e5d49b1fcfafd04391ec565de61660e385a
3
+ size 4857402936
model-00006-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c683d7344a2189a367eba30a4fcb47562ded123c98f90e91eb9caedd04c66cb2
3
+ size 4990466360
model-00007-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a17e00d0bfb17cb8febf51d3239f351cbd9ea88e5d19894f6cfc2e3ded8eb9d
3
+ size 1341218552
model.safetensors.index.json ADDED
@@ -0,0 +1,571 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 30712913920
4
+ },
5
+ "weight_map": {
6
+ "model.transformer.blocks.0.att_proj.weight": "model-00001-of-00007.safetensors",
7
+ "model.transformer.blocks.0.attn_norm.weight": "model-00001-of-00007.safetensors",
8
+ "model.transformer.blocks.0.attn_out.weight": "model-00001-of-00007.safetensors",
9
+ "model.transformer.blocks.0.ff_norm.weight": "model-00001-of-00007.safetensors",
10
+ "model.transformer.blocks.0.ff_out.weight": "model-00001-of-00007.safetensors",
11
+ "model.transformer.blocks.0.ff_proj.weight": "model-00001-of-00007.safetensors",
12
+ "model.transformer.blocks.0.k_norm.weight": "model-00001-of-00007.safetensors",
13
+ "model.transformer.blocks.0.q_norm.weight": "model-00001-of-00007.safetensors",
14
+ "model.transformer.blocks.1.att_proj.weight": "model-00001-of-00007.safetensors",
15
+ "model.transformer.blocks.1.attn_norm.weight": "model-00001-of-00007.safetensors",
16
+ "model.transformer.blocks.1.attn_out.weight": "model-00001-of-00007.safetensors",
17
+ "model.transformer.blocks.1.ff_norm.weight": "model-00001-of-00007.safetensors",
18
+ "model.transformer.blocks.1.ff_out.weight": "model-00001-of-00007.safetensors",
19
+ "model.transformer.blocks.1.ff_proj.weight": "model-00001-of-00007.safetensors",
20
+ "model.transformer.blocks.1.k_norm.weight": "model-00001-of-00007.safetensors",
21
+ "model.transformer.blocks.1.q_norm.weight": "model-00001-of-00007.safetensors",
22
+ "model.transformer.blocks.10.att_proj.weight": "model-00003-of-00007.safetensors",
23
+ "model.transformer.blocks.10.attn_norm.weight": "model-00003-of-00007.safetensors",
24
+ "model.transformer.blocks.10.attn_out.weight": "model-00002-of-00007.safetensors",
25
+ "model.transformer.blocks.10.ff_norm.weight": "model-00003-of-00007.safetensors",
26
+ "model.transformer.blocks.10.ff_out.weight": "model-00003-of-00007.safetensors",
27
+ "model.transformer.blocks.10.ff_proj.weight": "model-00003-of-00007.safetensors",
28
+ "model.transformer.blocks.10.k_norm.weight": "model-00002-of-00007.safetensors",
29
+ "model.transformer.blocks.10.q_norm.weight": "model-00002-of-00007.safetensors",
30
+ "model.transformer.blocks.11.att_proj.weight": "model-00003-of-00007.safetensors",
31
+ "model.transformer.blocks.11.attn_norm.weight": "model-00003-of-00007.safetensors",
32
+ "model.transformer.blocks.11.attn_out.weight": "model-00003-of-00007.safetensors",
33
+ "model.transformer.blocks.11.ff_norm.weight": "model-00003-of-00007.safetensors",
34
+ "model.transformer.blocks.11.ff_out.weight": "model-00003-of-00007.safetensors",
35
+ "model.transformer.blocks.11.ff_proj.weight": "model-00003-of-00007.safetensors",
36
+ "model.transformer.blocks.11.k_norm.weight": "model-00003-of-00007.safetensors",
37
+ "model.transformer.blocks.11.q_norm.weight": "model-00003-of-00007.safetensors",
38
+ "model.transformer.blocks.12.att_proj.weight": "model-00003-of-00007.safetensors",
39
+ "model.transformer.blocks.12.attn_norm.weight": "model-00003-of-00007.safetensors",
40
+ "model.transformer.blocks.12.attn_out.weight": "model-00003-of-00007.safetensors",
41
+ "model.transformer.blocks.12.ff_norm.weight": "model-00003-of-00007.safetensors",
42
+ "model.transformer.blocks.12.ff_out.weight": "model-00003-of-00007.safetensors",
43
+ "model.transformer.blocks.12.ff_proj.weight": "model-00003-of-00007.safetensors",
44
+ "model.transformer.blocks.12.k_norm.weight": "model-00003-of-00007.safetensors",
45
+ "model.transformer.blocks.12.q_norm.weight": "model-00003-of-00007.safetensors",
46
+ "model.transformer.blocks.13.att_proj.weight": "model-00003-of-00007.safetensors",
47
+ "model.transformer.blocks.13.attn_norm.weight": "model-00003-of-00007.safetensors",
48
+ "model.transformer.blocks.13.attn_out.weight": "model-00003-of-00007.safetensors",
49
+ "model.transformer.blocks.13.ff_norm.weight": "model-00003-of-00007.safetensors",
50
+ "model.transformer.blocks.13.ff_out.weight": "model-00003-of-00007.safetensors",
51
+ "model.transformer.blocks.13.ff_proj.weight": "model-00003-of-00007.safetensors",
52
+ "model.transformer.blocks.13.k_norm.weight": "model-00003-of-00007.safetensors",
53
+ "model.transformer.blocks.13.q_norm.weight": "model-00003-of-00007.safetensors",
54
+ "model.transformer.blocks.14.att_proj.weight": "model-00003-of-00007.safetensors",
55
+ "model.transformer.blocks.14.attn_norm.weight": "model-00003-of-00007.safetensors",
56
+ "model.transformer.blocks.14.attn_out.weight": "model-00003-of-00007.safetensors",
57
+ "model.transformer.blocks.14.ff_norm.weight": "model-00003-of-00007.safetensors",
58
+ "model.transformer.blocks.14.ff_out.weight": "model-00003-of-00007.safetensors",
59
+ "model.transformer.blocks.14.ff_proj.weight": "model-00003-of-00007.safetensors",
60
+ "model.transformer.blocks.14.k_norm.weight": "model-00003-of-00007.safetensors",
61
+ "model.transformer.blocks.14.q_norm.weight": "model-00003-of-00007.safetensors",
62
+ "model.transformer.blocks.15.att_proj.weight": "model-00003-of-00007.safetensors",
63
+ "model.transformer.blocks.15.attn_norm.weight": "model-00003-of-00007.safetensors",
64
+ "model.transformer.blocks.15.attn_out.weight": "model-00003-of-00007.safetensors",
65
+ "model.transformer.blocks.15.ff_norm.weight": "model-00003-of-00007.safetensors",
66
+ "model.transformer.blocks.15.ff_out.weight": "model-00003-of-00007.safetensors",
67
+ "model.transformer.blocks.15.ff_proj.weight": "model-00003-of-00007.safetensors",
68
+ "model.transformer.blocks.15.k_norm.weight": "model-00003-of-00007.safetensors",
69
+ "model.transformer.blocks.15.q_norm.weight": "model-00003-of-00007.safetensors",
70
+ "model.transformer.blocks.16.att_proj.weight": "model-00004-of-00007.safetensors",
71
+ "model.transformer.blocks.16.attn_norm.weight": "model-00004-of-00007.safetensors",
72
+ "model.transformer.blocks.16.attn_out.weight": "model-00003-of-00007.safetensors",
73
+ "model.transformer.blocks.16.ff_norm.weight": "model-00004-of-00007.safetensors",
74
+ "model.transformer.blocks.16.ff_out.weight": "model-00004-of-00007.safetensors",
75
+ "model.transformer.blocks.16.ff_proj.weight": "model-00004-of-00007.safetensors",
76
+ "model.transformer.blocks.16.k_norm.weight": "model-00003-of-00007.safetensors",
77
+ "model.transformer.blocks.16.q_norm.weight": "model-00003-of-00007.safetensors",
78
+ "model.transformer.blocks.17.att_proj.weight": "model-00004-of-00007.safetensors",
79
+ "model.transformer.blocks.17.attn_norm.weight": "model-00004-of-00007.safetensors",
80
+ "model.transformer.blocks.17.attn_out.weight": "model-00004-of-00007.safetensors",
81
+ "model.transformer.blocks.17.ff_norm.weight": "model-00004-of-00007.safetensors",
82
+ "model.transformer.blocks.17.ff_out.weight": "model-00004-of-00007.safetensors",
83
+ "model.transformer.blocks.17.ff_proj.weight": "model-00004-of-00007.safetensors",
84
+ "model.transformer.blocks.17.k_norm.weight": "model-00004-of-00007.safetensors",
85
+ "model.transformer.blocks.17.q_norm.weight": "model-00004-of-00007.safetensors",
86
+ "model.transformer.blocks.18.att_proj.weight": "model-00004-of-00007.safetensors",
87
+ "model.transformer.blocks.18.attn_norm.weight": "model-00004-of-00007.safetensors",
88
+ "model.transformer.blocks.18.attn_out.weight": "model-00004-of-00007.safetensors",
89
+ "model.transformer.blocks.18.ff_norm.weight": "model-00004-of-00007.safetensors",
90
+ "model.transformer.blocks.18.ff_out.weight": "model-00004-of-00007.safetensors",
91
+ "model.transformer.blocks.18.ff_proj.weight": "model-00004-of-00007.safetensors",
92
+ "model.transformer.blocks.18.k_norm.weight": "model-00004-of-00007.safetensors",
93
+ "model.transformer.blocks.18.q_norm.weight": "model-00004-of-00007.safetensors",
94
+ "model.transformer.blocks.19.att_proj.weight": "model-00004-of-00007.safetensors",
95
+ "model.transformer.blocks.19.attn_norm.weight": "model-00004-of-00007.safetensors",
96
+ "model.transformer.blocks.19.attn_out.weight": "model-00004-of-00007.safetensors",
97
+ "model.transformer.blocks.19.ff_norm.weight": "model-00004-of-00007.safetensors",
98
+ "model.transformer.blocks.19.ff_out.weight": "model-00004-of-00007.safetensors",
99
+ "model.transformer.blocks.19.ff_proj.weight": "model-00004-of-00007.safetensors",
100
+ "model.transformer.blocks.19.k_norm.weight": "model-00004-of-00007.safetensors",
101
+ "model.transformer.blocks.19.q_norm.weight": "model-00004-of-00007.safetensors",
102
+ "model.transformer.blocks.2.att_proj.weight": "model-00001-of-00007.safetensors",
103
+ "model.transformer.blocks.2.attn_norm.weight": "model-00001-of-00007.safetensors",
104
+ "model.transformer.blocks.2.attn_out.weight": "model-00001-of-00007.safetensors",
105
+ "model.transformer.blocks.2.ff_norm.weight": "model-00001-of-00007.safetensors",
106
+ "model.transformer.blocks.2.ff_out.weight": "model-00001-of-00007.safetensors",
107
+ "model.transformer.blocks.2.ff_proj.weight": "model-00001-of-00007.safetensors",
108
+ "model.transformer.blocks.2.k_norm.weight": "model-00001-of-00007.safetensors",
109
+ "model.transformer.blocks.2.q_norm.weight": "model-00001-of-00007.safetensors",
110
+ "model.transformer.blocks.20.att_proj.weight": "model-00004-of-00007.safetensors",
111
+ "model.transformer.blocks.20.attn_norm.weight": "model-00004-of-00007.safetensors",
112
+ "model.transformer.blocks.20.attn_out.weight": "model-00004-of-00007.safetensors",
113
+ "model.transformer.blocks.20.ff_norm.weight": "model-00004-of-00007.safetensors",
114
+ "model.transformer.blocks.20.ff_out.weight": "model-00004-of-00007.safetensors",
115
+ "model.transformer.blocks.20.ff_proj.weight": "model-00004-of-00007.safetensors",
116
+ "model.transformer.blocks.20.k_norm.weight": "model-00004-of-00007.safetensors",
117
+ "model.transformer.blocks.20.q_norm.weight": "model-00004-of-00007.safetensors",
118
+ "model.transformer.blocks.21.att_proj.weight": "model-00004-of-00007.safetensors",
119
+ "model.transformer.blocks.21.attn_norm.weight": "model-00004-of-00007.safetensors",
120
+ "model.transformer.blocks.21.attn_out.weight": "model-00004-of-00007.safetensors",
121
+ "model.transformer.blocks.21.ff_norm.weight": "model-00004-of-00007.safetensors",
122
+ "model.transformer.blocks.21.ff_out.weight": "model-00004-of-00007.safetensors",
123
+ "model.transformer.blocks.21.ff_proj.weight": "model-00004-of-00007.safetensors",
124
+ "model.transformer.blocks.21.k_norm.weight": "model-00004-of-00007.safetensors",
125
+ "model.transformer.blocks.21.q_norm.weight": "model-00004-of-00007.safetensors",
126
+ "model.transformer.blocks.22.att_proj.weight": "model-00005-of-00007.safetensors",
127
+ "model.transformer.blocks.22.attn_norm.weight": "model-00005-of-00007.safetensors",
128
+ "model.transformer.blocks.22.attn_out.weight": "model-00004-of-00007.safetensors",
129
+ "model.transformer.blocks.22.ff_norm.weight": "model-00005-of-00007.safetensors",
130
+ "model.transformer.blocks.22.ff_out.weight": "model-00005-of-00007.safetensors",
131
+ "model.transformer.blocks.22.ff_proj.weight": "model-00005-of-00007.safetensors",
132
+ "model.transformer.blocks.22.k_norm.weight": "model-00004-of-00007.safetensors",
133
+ "model.transformer.blocks.22.q_norm.weight": "model-00004-of-00007.safetensors",
134
+ "model.transformer.blocks.23.att_proj.weight": "model-00005-of-00007.safetensors",
135
+ "model.transformer.blocks.23.attn_norm.weight": "model-00005-of-00007.safetensors",
136
+ "model.transformer.blocks.23.attn_out.weight": "model-00005-of-00007.safetensors",
137
+ "model.transformer.blocks.23.ff_norm.weight": "model-00005-of-00007.safetensors",
138
+ "model.transformer.blocks.23.ff_out.weight": "model-00005-of-00007.safetensors",
139
+ "model.transformer.blocks.23.ff_proj.weight": "model-00005-of-00007.safetensors",
140
+ "model.transformer.blocks.23.k_norm.weight": "model-00005-of-00007.safetensors",
141
+ "model.transformer.blocks.23.q_norm.weight": "model-00005-of-00007.safetensors",
142
+ "model.transformer.blocks.24.att_proj.weight": "model-00005-of-00007.safetensors",
143
+ "model.transformer.blocks.24.attn_norm.weight": "model-00005-of-00007.safetensors",
144
+ "model.transformer.blocks.24.attn_out.weight": "model-00005-of-00007.safetensors",
145
+ "model.transformer.blocks.24.ff_norm.weight": "model-00005-of-00007.safetensors",
146
+ "model.transformer.blocks.24.ff_out.weight": "model-00005-of-00007.safetensors",
147
+ "model.transformer.blocks.24.ff_proj.weight": "model-00005-of-00007.safetensors",
148
+ "model.transformer.blocks.24.k_norm.weight": "model-00005-of-00007.safetensors",
149
+ "model.transformer.blocks.24.q_norm.weight": "model-00005-of-00007.safetensors",
150
+ "model.transformer.blocks.25.att_proj.weight": "model-00005-of-00007.safetensors",
151
+ "model.transformer.blocks.25.attn_norm.weight": "model-00005-of-00007.safetensors",
152
+ "model.transformer.blocks.25.attn_out.weight": "model-00005-of-00007.safetensors",
153
+ "model.transformer.blocks.25.ff_norm.weight": "model-00005-of-00007.safetensors",
154
+ "model.transformer.blocks.25.ff_out.weight": "model-00005-of-00007.safetensors",
155
+ "model.transformer.blocks.25.ff_proj.weight": "model-00005-of-00007.safetensors",
156
+ "model.transformer.blocks.25.k_norm.weight": "model-00005-of-00007.safetensors",
157
+ "model.transformer.blocks.25.q_norm.weight": "model-00005-of-00007.safetensors",
158
+ "model.transformer.blocks.26.att_proj.weight": "model-00005-of-00007.safetensors",
159
+ "model.transformer.blocks.26.attn_norm.weight": "model-00005-of-00007.safetensors",
160
+ "model.transformer.blocks.26.attn_out.weight": "model-00005-of-00007.safetensors",
161
+ "model.transformer.blocks.26.ff_norm.weight": "model-00005-of-00007.safetensors",
162
+ "model.transformer.blocks.26.ff_out.weight": "model-00005-of-00007.safetensors",
163
+ "model.transformer.blocks.26.ff_proj.weight": "model-00005-of-00007.safetensors",
164
+ "model.transformer.blocks.26.k_norm.weight": "model-00005-of-00007.safetensors",
165
+ "model.transformer.blocks.26.q_norm.weight": "model-00005-of-00007.safetensors",
166
+ "model.transformer.blocks.27.att_proj.weight": "model-00005-of-00007.safetensors",
167
+ "model.transformer.blocks.27.attn_norm.weight": "model-00005-of-00007.safetensors",
168
+ "model.transformer.blocks.27.attn_out.weight": "model-00005-of-00007.safetensors",
169
+ "model.transformer.blocks.27.ff_norm.weight": "model-00005-of-00007.safetensors",
170
+ "model.transformer.blocks.27.ff_out.weight": "model-00005-of-00007.safetensors",
171
+ "model.transformer.blocks.27.ff_proj.weight": "model-00005-of-00007.safetensors",
172
+ "model.transformer.blocks.27.k_norm.weight": "model-00005-of-00007.safetensors",
173
+ "model.transformer.blocks.27.q_norm.weight": "model-00005-of-00007.safetensors",
174
+ "model.transformer.blocks.28.att_proj.weight": "model-00006-of-00007.safetensors",
175
+ "model.transformer.blocks.28.attn_norm.weight": "model-00006-of-00007.safetensors",
176
+ "model.transformer.blocks.28.attn_out.weight": "model-00005-of-00007.safetensors",
177
+ "model.transformer.blocks.28.ff_norm.weight": "model-00006-of-00007.safetensors",
178
+ "model.transformer.blocks.28.ff_out.weight": "model-00006-of-00007.safetensors",
179
+ "model.transformer.blocks.28.ff_proj.weight": "model-00006-of-00007.safetensors",
180
+ "model.transformer.blocks.28.k_norm.weight": "model-00005-of-00007.safetensors",
181
+ "model.transformer.blocks.28.q_norm.weight": "model-00005-of-00007.safetensors",
182
+ "model.transformer.blocks.29.att_proj.weight": "model-00006-of-00007.safetensors",
183
+ "model.transformer.blocks.29.attn_norm.weight": "model-00006-of-00007.safetensors",
184
+ "model.transformer.blocks.29.attn_out.weight": "model-00006-of-00007.safetensors",
185
+ "model.transformer.blocks.29.ff_norm.weight": "model-00006-of-00007.safetensors",
186
+ "model.transformer.blocks.29.ff_out.weight": "model-00006-of-00007.safetensors",
187
+ "model.transformer.blocks.29.ff_proj.weight": "model-00006-of-00007.safetensors",
188
+ "model.transformer.blocks.29.k_norm.weight": "model-00006-of-00007.safetensors",
189
+ "model.transformer.blocks.29.q_norm.weight": "model-00006-of-00007.safetensors",
190
+ "model.transformer.blocks.3.att_proj.weight": "model-00001-of-00007.safetensors",
191
+ "model.transformer.blocks.3.attn_norm.weight": "model-00001-of-00007.safetensors",
192
+ "model.transformer.blocks.3.attn_out.weight": "model-00001-of-00007.safetensors",
193
+ "model.transformer.blocks.3.ff_norm.weight": "model-00001-of-00007.safetensors",
194
+ "model.transformer.blocks.3.ff_out.weight": "model-00001-of-00007.safetensors",
195
+ "model.transformer.blocks.3.ff_proj.weight": "model-00001-of-00007.safetensors",
196
+ "model.transformer.blocks.3.k_norm.weight": "model-00001-of-00007.safetensors",
197
+ "model.transformer.blocks.3.q_norm.weight": "model-00001-of-00007.safetensors",
198
+ "model.transformer.blocks.30.att_proj.weight": "model-00006-of-00007.safetensors",
199
+ "model.transformer.blocks.30.attn_norm.weight": "model-00006-of-00007.safetensors",
200
+ "model.transformer.blocks.30.attn_out.weight": "model-00006-of-00007.safetensors",
201
+ "model.transformer.blocks.30.ff_norm.weight": "model-00006-of-00007.safetensors",
202
+ "model.transformer.blocks.30.ff_out.weight": "model-00006-of-00007.safetensors",
203
+ "model.transformer.blocks.30.ff_proj.weight": "model-00006-of-00007.safetensors",
204
+ "model.transformer.blocks.30.k_norm.weight": "model-00006-of-00007.safetensors",
205
+ "model.transformer.blocks.30.q_norm.weight": "model-00006-of-00007.safetensors",
206
+ "model.transformer.blocks.31.att_proj.weight": "model-00006-of-00007.safetensors",
207
+ "model.transformer.blocks.31.attn_norm.weight": "model-00006-of-00007.safetensors",
208
+ "model.transformer.blocks.31.attn_out.weight": "model-00006-of-00007.safetensors",
209
+ "model.transformer.blocks.31.ff_norm.weight": "model-00006-of-00007.safetensors",
210
+ "model.transformer.blocks.31.ff_out.weight": "model-00006-of-00007.safetensors",
211
+ "model.transformer.blocks.31.ff_proj.weight": "model-00006-of-00007.safetensors",
212
+ "model.transformer.blocks.31.k_norm.weight": "model-00006-of-00007.safetensors",
213
+ "model.transformer.blocks.31.q_norm.weight": "model-00006-of-00007.safetensors",
214
+ "model.transformer.blocks.4.att_proj.weight": "model-00002-of-00007.safetensors",
215
+ "model.transformer.blocks.4.attn_norm.weight": "model-00002-of-00007.safetensors",
216
+ "model.transformer.blocks.4.attn_out.weight": "model-00001-of-00007.safetensors",
217
+ "model.transformer.blocks.4.ff_norm.weight": "model-00002-of-00007.safetensors",
218
+ "model.transformer.blocks.4.ff_out.weight": "model-00002-of-00007.safetensors",
219
+ "model.transformer.blocks.4.ff_proj.weight": "model-00002-of-00007.safetensors",
220
+ "model.transformer.blocks.4.k_norm.weight": "model-00001-of-00007.safetensors",
221
+ "model.transformer.blocks.4.q_norm.weight": "model-00001-of-00007.safetensors",
222
+ "model.transformer.blocks.5.att_proj.weight": "model-00002-of-00007.safetensors",
223
+ "model.transformer.blocks.5.attn_norm.weight": "model-00002-of-00007.safetensors",
224
+ "model.transformer.blocks.5.attn_out.weight": "model-00002-of-00007.safetensors",
225
+ "model.transformer.blocks.5.ff_norm.weight": "model-00002-of-00007.safetensors",
226
+ "model.transformer.blocks.5.ff_out.weight": "model-00002-of-00007.safetensors",
227
+ "model.transformer.blocks.5.ff_proj.weight": "model-00002-of-00007.safetensors",
228
+ "model.transformer.blocks.5.k_norm.weight": "model-00002-of-00007.safetensors",
229
+ "model.transformer.blocks.5.q_norm.weight": "model-00002-of-00007.safetensors",
230
+ "model.transformer.blocks.6.att_proj.weight": "model-00002-of-00007.safetensors",
231
+ "model.transformer.blocks.6.attn_norm.weight": "model-00002-of-00007.safetensors",
232
+ "model.transformer.blocks.6.attn_out.weight": "model-00002-of-00007.safetensors",
233
+ "model.transformer.blocks.6.ff_norm.weight": "model-00002-of-00007.safetensors",
234
+ "model.transformer.blocks.6.ff_out.weight": "model-00002-of-00007.safetensors",
235
+ "model.transformer.blocks.6.ff_proj.weight": "model-00002-of-00007.safetensors",
236
+ "model.transformer.blocks.6.k_norm.weight": "model-00002-of-00007.safetensors",
237
+ "model.transformer.blocks.6.q_norm.weight": "model-00002-of-00007.safetensors",
238
+ "model.transformer.blocks.7.att_proj.weight": "model-00002-of-00007.safetensors",
239
+ "model.transformer.blocks.7.attn_norm.weight": "model-00002-of-00007.safetensors",
240
+ "model.transformer.blocks.7.attn_out.weight": "model-00002-of-00007.safetensors",
241
+ "model.transformer.blocks.7.ff_norm.weight": "model-00002-of-00007.safetensors",
242
+ "model.transformer.blocks.7.ff_out.weight": "model-00002-of-00007.safetensors",
243
+ "model.transformer.blocks.7.ff_proj.weight": "model-00002-of-00007.safetensors",
244
+ "model.transformer.blocks.7.k_norm.weight": "model-00002-of-00007.safetensors",
245
+ "model.transformer.blocks.7.q_norm.weight": "model-00002-of-00007.safetensors",
246
+ "model.transformer.blocks.8.att_proj.weight": "model-00002-of-00007.safetensors",
247
+ "model.transformer.blocks.8.attn_norm.weight": "model-00002-of-00007.safetensors",
248
+ "model.transformer.blocks.8.attn_out.weight": "model-00002-of-00007.safetensors",
249
+ "model.transformer.blocks.8.ff_norm.weight": "model-00002-of-00007.safetensors",
250
+ "model.transformer.blocks.8.ff_out.weight": "model-00002-of-00007.safetensors",
251
+ "model.transformer.blocks.8.ff_proj.weight": "model-00002-of-00007.safetensors",
252
+ "model.transformer.blocks.8.k_norm.weight": "model-00002-of-00007.safetensors",
253
+ "model.transformer.blocks.8.q_norm.weight": "model-00002-of-00007.safetensors",
254
+ "model.transformer.blocks.9.att_proj.weight": "model-00002-of-00007.safetensors",
255
+ "model.transformer.blocks.9.attn_norm.weight": "model-00002-of-00007.safetensors",
256
+ "model.transformer.blocks.9.attn_out.weight": "model-00002-of-00007.safetensors",
257
+ "model.transformer.blocks.9.ff_norm.weight": "model-00002-of-00007.safetensors",
258
+ "model.transformer.blocks.9.ff_out.weight": "model-00002-of-00007.safetensors",
259
+ "model.transformer.blocks.9.ff_proj.weight": "model-00002-of-00007.safetensors",
260
+ "model.transformer.blocks.9.k_norm.weight": "model-00002-of-00007.safetensors",
261
+ "model.transformer.blocks.9.q_norm.weight": "model-00002-of-00007.safetensors",
262
+ "model.transformer.ff_out.weight": "model-00006-of-00007.safetensors",
263
+ "model.transformer.ln_f.weight": "model-00001-of-00007.safetensors",
264
+ "model.transformer.wte.embedding": "model-00001-of-00007.safetensors",
265
+ "model.transformer.wte.new_embedding": "model-00001-of-00007.safetensors",
266
+ "model.vision_backbone.image_pooling_2d.wk.bias": "model-00007-of-00007.safetensors",
267
+ "model.vision_backbone.image_pooling_2d.wk.weight": "model-00007-of-00007.safetensors",
268
+ "model.vision_backbone.image_pooling_2d.wo.bias": "model-00007-of-00007.safetensors",
269
+ "model.vision_backbone.image_pooling_2d.wo.weight": "model-00007-of-00007.safetensors",
270
+ "model.vision_backbone.image_pooling_2d.wq.bias": "model-00007-of-00007.safetensors",
271
+ "model.vision_backbone.image_pooling_2d.wq.weight": "model-00007-of-00007.safetensors",
272
+ "model.vision_backbone.image_pooling_2d.wv.bias": "model-00007-of-00007.safetensors",
273
+ "model.vision_backbone.image_pooling_2d.wv.weight": "model-00007-of-00007.safetensors",
274
+ "model.vision_backbone.image_projector.w1.weight": "model-00007-of-00007.safetensors",
275
+ "model.vision_backbone.image_projector.w2.weight": "model-00007-of-00007.safetensors",
276
+ "model.vision_backbone.image_projector.w3.weight": "model-00007-of-00007.safetensors",
277
+ "model.vision_backbone.image_vit.class_embedding": "model-00006-of-00007.safetensors",
278
+ "model.vision_backbone.image_vit.conv1.weight": "model-00006-of-00007.safetensors",
279
+ "model.vision_backbone.image_vit.patch_embedding.weight": "model-00006-of-00007.safetensors",
280
+ "model.vision_backbone.image_vit.positional_embedding": "model-00006-of-00007.safetensors",
281
+ "model.vision_backbone.image_vit.transformer.resblocks.0.attn.in_proj_bias": "model-00006-of-00007.safetensors",
282
+ "model.vision_backbone.image_vit.transformer.resblocks.0.attn.in_proj_weight": "model-00006-of-00007.safetensors",
283
+ "model.vision_backbone.image_vit.transformer.resblocks.0.attn.out_proj.bias": "model-00006-of-00007.safetensors",
284
+ "model.vision_backbone.image_vit.transformer.resblocks.0.attn.out_proj.weight": "model-00006-of-00007.safetensors",
285
+ "model.vision_backbone.image_vit.transformer.resblocks.0.ln_1.bias": "model-00006-of-00007.safetensors",
286
+ "model.vision_backbone.image_vit.transformer.resblocks.0.ln_1.weight": "model-00006-of-00007.safetensors",
287
+ "model.vision_backbone.image_vit.transformer.resblocks.0.ln_2.bias": "model-00006-of-00007.safetensors",
288
+ "model.vision_backbone.image_vit.transformer.resblocks.0.ln_2.weight": "model-00006-of-00007.safetensors",
289
+ "model.vision_backbone.image_vit.transformer.resblocks.0.mlp.c_fc.bias": "model-00006-of-00007.safetensors",
290
+ "model.vision_backbone.image_vit.transformer.resblocks.0.mlp.c_fc.weight": "model-00006-of-00007.safetensors",
291
+ "model.vision_backbone.image_vit.transformer.resblocks.0.mlp.c_proj.bias": "model-00006-of-00007.safetensors",
292
+ "model.vision_backbone.image_vit.transformer.resblocks.0.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
293
+ "model.vision_backbone.image_vit.transformer.resblocks.1.attn.in_proj_bias": "model-00006-of-00007.safetensors",
294
+ "model.vision_backbone.image_vit.transformer.resblocks.1.attn.in_proj_weight": "model-00006-of-00007.safetensors",
295
+ "model.vision_backbone.image_vit.transformer.resblocks.1.attn.out_proj.bias": "model-00006-of-00007.safetensors",
296
+ "model.vision_backbone.image_vit.transformer.resblocks.1.attn.out_proj.weight": "model-00006-of-00007.safetensors",
297
+ "model.vision_backbone.image_vit.transformer.resblocks.1.ln_1.bias": "model-00006-of-00007.safetensors",
298
+ "model.vision_backbone.image_vit.transformer.resblocks.1.ln_1.weight": "model-00006-of-00007.safetensors",
299
+ "model.vision_backbone.image_vit.transformer.resblocks.1.ln_2.bias": "model-00006-of-00007.safetensors",
300
+ "model.vision_backbone.image_vit.transformer.resblocks.1.ln_2.weight": "model-00006-of-00007.safetensors",
301
+ "model.vision_backbone.image_vit.transformer.resblocks.1.mlp.c_fc.bias": "model-00006-of-00007.safetensors",
302
+ "model.vision_backbone.image_vit.transformer.resblocks.1.mlp.c_fc.weight": "model-00006-of-00007.safetensors",
303
+ "model.vision_backbone.image_vit.transformer.resblocks.1.mlp.c_proj.bias": "model-00006-of-00007.safetensors",
304
+ "model.vision_backbone.image_vit.transformer.resblocks.1.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
305
+ "model.vision_backbone.image_vit.transformer.resblocks.10.attn.in_proj_bias": "model-00007-of-00007.safetensors",
306
+ "model.vision_backbone.image_vit.transformer.resblocks.10.attn.in_proj_weight": "model-00007-of-00007.safetensors",
307
+ "model.vision_backbone.image_vit.transformer.resblocks.10.attn.out_proj.bias": "model-00007-of-00007.safetensors",
308
+ "model.vision_backbone.image_vit.transformer.resblocks.10.attn.out_proj.weight": "model-00007-of-00007.safetensors",
309
+ "model.vision_backbone.image_vit.transformer.resblocks.10.ln_1.bias": "model-00007-of-00007.safetensors",
310
+ "model.vision_backbone.image_vit.transformer.resblocks.10.ln_1.weight": "model-00007-of-00007.safetensors",
311
+ "model.vision_backbone.image_vit.transformer.resblocks.10.ln_2.bias": "model-00007-of-00007.safetensors",
312
+ "model.vision_backbone.image_vit.transformer.resblocks.10.ln_2.weight": "model-00007-of-00007.safetensors",
313
+ "model.vision_backbone.image_vit.transformer.resblocks.10.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
314
+ "model.vision_backbone.image_vit.transformer.resblocks.10.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
315
+ "model.vision_backbone.image_vit.transformer.resblocks.10.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
316
+ "model.vision_backbone.image_vit.transformer.resblocks.10.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
317
+ "model.vision_backbone.image_vit.transformer.resblocks.11.attn.in_proj_bias": "model-00007-of-00007.safetensors",
318
+ "model.vision_backbone.image_vit.transformer.resblocks.11.attn.in_proj_weight": "model-00007-of-00007.safetensors",
319
+ "model.vision_backbone.image_vit.transformer.resblocks.11.attn.out_proj.bias": "model-00007-of-00007.safetensors",
320
+ "model.vision_backbone.image_vit.transformer.resblocks.11.attn.out_proj.weight": "model-00007-of-00007.safetensors",
321
+ "model.vision_backbone.image_vit.transformer.resblocks.11.ln_1.bias": "model-00007-of-00007.safetensors",
322
+ "model.vision_backbone.image_vit.transformer.resblocks.11.ln_1.weight": "model-00007-of-00007.safetensors",
323
+ "model.vision_backbone.image_vit.transformer.resblocks.11.ln_2.bias": "model-00007-of-00007.safetensors",
324
+ "model.vision_backbone.image_vit.transformer.resblocks.11.ln_2.weight": "model-00007-of-00007.safetensors",
325
+ "model.vision_backbone.image_vit.transformer.resblocks.11.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
326
+ "model.vision_backbone.image_vit.transformer.resblocks.11.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
327
+ "model.vision_backbone.image_vit.transformer.resblocks.11.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
328
+ "model.vision_backbone.image_vit.transformer.resblocks.11.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
329
+ "model.vision_backbone.image_vit.transformer.resblocks.12.attn.in_proj_bias": "model-00007-of-00007.safetensors",
330
+ "model.vision_backbone.image_vit.transformer.resblocks.12.attn.in_proj_weight": "model-00007-of-00007.safetensors",
331
+ "model.vision_backbone.image_vit.transformer.resblocks.12.attn.out_proj.bias": "model-00007-of-00007.safetensors",
332
+ "model.vision_backbone.image_vit.transformer.resblocks.12.attn.out_proj.weight": "model-00007-of-00007.safetensors",
333
+ "model.vision_backbone.image_vit.transformer.resblocks.12.ln_1.bias": "model-00007-of-00007.safetensors",
334
+ "model.vision_backbone.image_vit.transformer.resblocks.12.ln_1.weight": "model-00007-of-00007.safetensors",
335
+ "model.vision_backbone.image_vit.transformer.resblocks.12.ln_2.bias": "model-00007-of-00007.safetensors",
336
+ "model.vision_backbone.image_vit.transformer.resblocks.12.ln_2.weight": "model-00007-of-00007.safetensors",
337
+ "model.vision_backbone.image_vit.transformer.resblocks.12.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
338
+ "model.vision_backbone.image_vit.transformer.resblocks.12.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
339
+ "model.vision_backbone.image_vit.transformer.resblocks.12.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
340
+ "model.vision_backbone.image_vit.transformer.resblocks.12.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
341
+ "model.vision_backbone.image_vit.transformer.resblocks.13.attn.in_proj_bias": "model-00007-of-00007.safetensors",
342
+ "model.vision_backbone.image_vit.transformer.resblocks.13.attn.in_proj_weight": "model-00007-of-00007.safetensors",
343
+ "model.vision_backbone.image_vit.transformer.resblocks.13.attn.out_proj.bias": "model-00007-of-00007.safetensors",
344
+ "model.vision_backbone.image_vit.transformer.resblocks.13.attn.out_proj.weight": "model-00007-of-00007.safetensors",
345
+ "model.vision_backbone.image_vit.transformer.resblocks.13.ln_1.bias": "model-00007-of-00007.safetensors",
346
+ "model.vision_backbone.image_vit.transformer.resblocks.13.ln_1.weight": "model-00007-of-00007.safetensors",
347
+ "model.vision_backbone.image_vit.transformer.resblocks.13.ln_2.bias": "model-00007-of-00007.safetensors",
348
+ "model.vision_backbone.image_vit.transformer.resblocks.13.ln_2.weight": "model-00007-of-00007.safetensors",
349
+ "model.vision_backbone.image_vit.transformer.resblocks.13.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
350
+ "model.vision_backbone.image_vit.transformer.resblocks.13.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
351
+ "model.vision_backbone.image_vit.transformer.resblocks.13.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
352
+ "model.vision_backbone.image_vit.transformer.resblocks.13.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
353
+ "model.vision_backbone.image_vit.transformer.resblocks.14.attn.in_proj_bias": "model-00007-of-00007.safetensors",
354
+ "model.vision_backbone.image_vit.transformer.resblocks.14.attn.in_proj_weight": "model-00007-of-00007.safetensors",
355
+ "model.vision_backbone.image_vit.transformer.resblocks.14.attn.out_proj.bias": "model-00007-of-00007.safetensors",
356
+ "model.vision_backbone.image_vit.transformer.resblocks.14.attn.out_proj.weight": "model-00007-of-00007.safetensors",
357
+ "model.vision_backbone.image_vit.transformer.resblocks.14.ln_1.bias": "model-00007-of-00007.safetensors",
358
+ "model.vision_backbone.image_vit.transformer.resblocks.14.ln_1.weight": "model-00007-of-00007.safetensors",
359
+ "model.vision_backbone.image_vit.transformer.resblocks.14.ln_2.bias": "model-00007-of-00007.safetensors",
360
+ "model.vision_backbone.image_vit.transformer.resblocks.14.ln_2.weight": "model-00007-of-00007.safetensors",
361
+ "model.vision_backbone.image_vit.transformer.resblocks.14.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
362
+ "model.vision_backbone.image_vit.transformer.resblocks.14.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
363
+ "model.vision_backbone.image_vit.transformer.resblocks.14.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
364
+ "model.vision_backbone.image_vit.transformer.resblocks.14.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
365
+ "model.vision_backbone.image_vit.transformer.resblocks.15.attn.in_proj_bias": "model-00007-of-00007.safetensors",
366
+ "model.vision_backbone.image_vit.transformer.resblocks.15.attn.in_proj_weight": "model-00007-of-00007.safetensors",
367
+ "model.vision_backbone.image_vit.transformer.resblocks.15.attn.out_proj.bias": "model-00007-of-00007.safetensors",
368
+ "model.vision_backbone.image_vit.transformer.resblocks.15.attn.out_proj.weight": "model-00007-of-00007.safetensors",
369
+ "model.vision_backbone.image_vit.transformer.resblocks.15.ln_1.bias": "model-00007-of-00007.safetensors",
370
+ "model.vision_backbone.image_vit.transformer.resblocks.15.ln_1.weight": "model-00007-of-00007.safetensors",
371
+ "model.vision_backbone.image_vit.transformer.resblocks.15.ln_2.bias": "model-00007-of-00007.safetensors",
372
+ "model.vision_backbone.image_vit.transformer.resblocks.15.ln_2.weight": "model-00007-of-00007.safetensors",
373
+ "model.vision_backbone.image_vit.transformer.resblocks.15.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
374
+ "model.vision_backbone.image_vit.transformer.resblocks.15.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
375
+ "model.vision_backbone.image_vit.transformer.resblocks.15.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
376
+ "model.vision_backbone.image_vit.transformer.resblocks.15.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
377
+ "model.vision_backbone.image_vit.transformer.resblocks.16.attn.in_proj_bias": "model-00007-of-00007.safetensors",
378
+ "model.vision_backbone.image_vit.transformer.resblocks.16.attn.in_proj_weight": "model-00007-of-00007.safetensors",
379
+ "model.vision_backbone.image_vit.transformer.resblocks.16.attn.out_proj.bias": "model-00007-of-00007.safetensors",
380
+ "model.vision_backbone.image_vit.transformer.resblocks.16.attn.out_proj.weight": "model-00007-of-00007.safetensors",
381
+ "model.vision_backbone.image_vit.transformer.resblocks.16.ln_1.bias": "model-00007-of-00007.safetensors",
382
+ "model.vision_backbone.image_vit.transformer.resblocks.16.ln_1.weight": "model-00007-of-00007.safetensors",
383
+ "model.vision_backbone.image_vit.transformer.resblocks.16.ln_2.bias": "model-00007-of-00007.safetensors",
384
+ "model.vision_backbone.image_vit.transformer.resblocks.16.ln_2.weight": "model-00007-of-00007.safetensors",
385
+ "model.vision_backbone.image_vit.transformer.resblocks.16.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
386
+ "model.vision_backbone.image_vit.transformer.resblocks.16.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
387
+ "model.vision_backbone.image_vit.transformer.resblocks.16.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
388
+ "model.vision_backbone.image_vit.transformer.resblocks.16.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
389
+ "model.vision_backbone.image_vit.transformer.resblocks.17.attn.in_proj_bias": "model-00007-of-00007.safetensors",
390
+ "model.vision_backbone.image_vit.transformer.resblocks.17.attn.in_proj_weight": "model-00007-of-00007.safetensors",
391
+ "model.vision_backbone.image_vit.transformer.resblocks.17.attn.out_proj.bias": "model-00007-of-00007.safetensors",
392
+ "model.vision_backbone.image_vit.transformer.resblocks.17.attn.out_proj.weight": "model-00007-of-00007.safetensors",
393
+ "model.vision_backbone.image_vit.transformer.resblocks.17.ln_1.bias": "model-00007-of-00007.safetensors",
394
+ "model.vision_backbone.image_vit.transformer.resblocks.17.ln_1.weight": "model-00007-of-00007.safetensors",
395
+ "model.vision_backbone.image_vit.transformer.resblocks.17.ln_2.bias": "model-00007-of-00007.safetensors",
396
+ "model.vision_backbone.image_vit.transformer.resblocks.17.ln_2.weight": "model-00007-of-00007.safetensors",
397
+ "model.vision_backbone.image_vit.transformer.resblocks.17.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
398
+ "model.vision_backbone.image_vit.transformer.resblocks.17.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
399
+ "model.vision_backbone.image_vit.transformer.resblocks.17.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
400
+ "model.vision_backbone.image_vit.transformer.resblocks.17.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
401
+ "model.vision_backbone.image_vit.transformer.resblocks.18.attn.in_proj_bias": "model-00007-of-00007.safetensors",
402
+ "model.vision_backbone.image_vit.transformer.resblocks.18.attn.in_proj_weight": "model-00007-of-00007.safetensors",
403
+ "model.vision_backbone.image_vit.transformer.resblocks.18.attn.out_proj.bias": "model-00007-of-00007.safetensors",
404
+ "model.vision_backbone.image_vit.transformer.resblocks.18.attn.out_proj.weight": "model-00007-of-00007.safetensors",
405
+ "model.vision_backbone.image_vit.transformer.resblocks.18.ln_1.bias": "model-00007-of-00007.safetensors",
406
+ "model.vision_backbone.image_vit.transformer.resblocks.18.ln_1.weight": "model-00007-of-00007.safetensors",
407
+ "model.vision_backbone.image_vit.transformer.resblocks.18.ln_2.bias": "model-00007-of-00007.safetensors",
408
+ "model.vision_backbone.image_vit.transformer.resblocks.18.ln_2.weight": "model-00007-of-00007.safetensors",
409
+ "model.vision_backbone.image_vit.transformer.resblocks.18.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
410
+ "model.vision_backbone.image_vit.transformer.resblocks.18.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
411
+ "model.vision_backbone.image_vit.transformer.resblocks.18.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
412
+ "model.vision_backbone.image_vit.transformer.resblocks.18.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
413
+ "model.vision_backbone.image_vit.transformer.resblocks.19.attn.in_proj_bias": "model-00007-of-00007.safetensors",
414
+ "model.vision_backbone.image_vit.transformer.resblocks.19.attn.in_proj_weight": "model-00007-of-00007.safetensors",
415
+ "model.vision_backbone.image_vit.transformer.resblocks.19.attn.out_proj.bias": "model-00007-of-00007.safetensors",
416
+ "model.vision_backbone.image_vit.transformer.resblocks.19.attn.out_proj.weight": "model-00007-of-00007.safetensors",
417
+ "model.vision_backbone.image_vit.transformer.resblocks.19.ln_1.bias": "model-00007-of-00007.safetensors",
418
+ "model.vision_backbone.image_vit.transformer.resblocks.19.ln_1.weight": "model-00007-of-00007.safetensors",
419
+ "model.vision_backbone.image_vit.transformer.resblocks.19.ln_2.bias": "model-00007-of-00007.safetensors",
420
+ "model.vision_backbone.image_vit.transformer.resblocks.19.ln_2.weight": "model-00007-of-00007.safetensors",
421
+ "model.vision_backbone.image_vit.transformer.resblocks.19.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
422
+ "model.vision_backbone.image_vit.transformer.resblocks.19.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
423
+ "model.vision_backbone.image_vit.transformer.resblocks.19.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
424
+ "model.vision_backbone.image_vit.transformer.resblocks.19.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
425
+ "model.vision_backbone.image_vit.transformer.resblocks.2.attn.in_proj_bias": "model-00006-of-00007.safetensors",
426
+ "model.vision_backbone.image_vit.transformer.resblocks.2.attn.in_proj_weight": "model-00006-of-00007.safetensors",
427
+ "model.vision_backbone.image_vit.transformer.resblocks.2.attn.out_proj.bias": "model-00006-of-00007.safetensors",
428
+ "model.vision_backbone.image_vit.transformer.resblocks.2.attn.out_proj.weight": "model-00006-of-00007.safetensors",
429
+ "model.vision_backbone.image_vit.transformer.resblocks.2.ln_1.bias": "model-00006-of-00007.safetensors",
430
+ "model.vision_backbone.image_vit.transformer.resblocks.2.ln_1.weight": "model-00006-of-00007.safetensors",
431
+ "model.vision_backbone.image_vit.transformer.resblocks.2.ln_2.bias": "model-00006-of-00007.safetensors",
432
+ "model.vision_backbone.image_vit.transformer.resblocks.2.ln_2.weight": "model-00006-of-00007.safetensors",
433
+ "model.vision_backbone.image_vit.transformer.resblocks.2.mlp.c_fc.bias": "model-00006-of-00007.safetensors",
434
+ "model.vision_backbone.image_vit.transformer.resblocks.2.mlp.c_fc.weight": "model-00006-of-00007.safetensors",
435
+ "model.vision_backbone.image_vit.transformer.resblocks.2.mlp.c_proj.bias": "model-00006-of-00007.safetensors",
436
+ "model.vision_backbone.image_vit.transformer.resblocks.2.mlp.c_proj.weight": "model-00006-of-00007.safetensors",
437
+ "model.vision_backbone.image_vit.transformer.resblocks.20.attn.in_proj_bias": "model-00007-of-00007.safetensors",
438
+ "model.vision_backbone.image_vit.transformer.resblocks.20.attn.in_proj_weight": "model-00007-of-00007.safetensors",
439
+ "model.vision_backbone.image_vit.transformer.resblocks.20.attn.out_proj.bias": "model-00007-of-00007.safetensors",
440
+ "model.vision_backbone.image_vit.transformer.resblocks.20.attn.out_proj.weight": "model-00007-of-00007.safetensors",
441
+ "model.vision_backbone.image_vit.transformer.resblocks.20.ln_1.bias": "model-00007-of-00007.safetensors",
442
+ "model.vision_backbone.image_vit.transformer.resblocks.20.ln_1.weight": "model-00007-of-00007.safetensors",
443
+ "model.vision_backbone.image_vit.transformer.resblocks.20.ln_2.bias": "model-00007-of-00007.safetensors",
444
+ "model.vision_backbone.image_vit.transformer.resblocks.20.ln_2.weight": "model-00007-of-00007.safetensors",
445
+ "model.vision_backbone.image_vit.transformer.resblocks.20.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
446
+ "model.vision_backbone.image_vit.transformer.resblocks.20.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
447
+ "model.vision_backbone.image_vit.transformer.resblocks.20.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
448
+ "model.vision_backbone.image_vit.transformer.resblocks.20.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
449
+ "model.vision_backbone.image_vit.transformer.resblocks.21.attn.in_proj_bias": "model-00007-of-00007.safetensors",
450
+ "model.vision_backbone.image_vit.transformer.resblocks.21.attn.in_proj_weight": "model-00007-of-00007.safetensors",
451
+ "model.vision_backbone.image_vit.transformer.resblocks.21.attn.out_proj.bias": "model-00007-of-00007.safetensors",
452
+ "model.vision_backbone.image_vit.transformer.resblocks.21.attn.out_proj.weight": "model-00007-of-00007.safetensors",
453
+ "model.vision_backbone.image_vit.transformer.resblocks.21.ln_1.bias": "model-00007-of-00007.safetensors",
454
+ "model.vision_backbone.image_vit.transformer.resblocks.21.ln_1.weight": "model-00007-of-00007.safetensors",
455
+ "model.vision_backbone.image_vit.transformer.resblocks.21.ln_2.bias": "model-00007-of-00007.safetensors",
456
+ "model.vision_backbone.image_vit.transformer.resblocks.21.ln_2.weight": "model-00007-of-00007.safetensors",
457
+ "model.vision_backbone.image_vit.transformer.resblocks.21.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
458
+ "model.vision_backbone.image_vit.transformer.resblocks.21.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
459
+ "model.vision_backbone.image_vit.transformer.resblocks.21.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
460
+ "model.vision_backbone.image_vit.transformer.resblocks.21.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
461
+ "model.vision_backbone.image_vit.transformer.resblocks.22.attn.in_proj_bias": "model-00007-of-00007.safetensors",
462
+ "model.vision_backbone.image_vit.transformer.resblocks.22.attn.in_proj_weight": "model-00007-of-00007.safetensors",
463
+ "model.vision_backbone.image_vit.transformer.resblocks.22.attn.out_proj.bias": "model-00007-of-00007.safetensors",
464
+ "model.vision_backbone.image_vit.transformer.resblocks.22.attn.out_proj.weight": "model-00007-of-00007.safetensors",
465
+ "model.vision_backbone.image_vit.transformer.resblocks.22.ln_1.bias": "model-00007-of-00007.safetensors",
466
+ "model.vision_backbone.image_vit.transformer.resblocks.22.ln_1.weight": "model-00007-of-00007.safetensors",
467
+ "model.vision_backbone.image_vit.transformer.resblocks.22.ln_2.bias": "model-00007-of-00007.safetensors",
468
+ "model.vision_backbone.image_vit.transformer.resblocks.22.ln_2.weight": "model-00007-of-00007.safetensors",
469
+ "model.vision_backbone.image_vit.transformer.resblocks.22.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
470
+ "model.vision_backbone.image_vit.transformer.resblocks.22.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
471
+ "model.vision_backbone.image_vit.transformer.resblocks.22.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
472
+ "model.vision_backbone.image_vit.transformer.resblocks.22.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
473
+ "model.vision_backbone.image_vit.transformer.resblocks.23.attn.in_proj_bias": "model-00007-of-00007.safetensors",
474
+ "model.vision_backbone.image_vit.transformer.resblocks.23.attn.in_proj_weight": "model-00007-of-00007.safetensors",
475
+ "model.vision_backbone.image_vit.transformer.resblocks.23.attn.out_proj.bias": "model-00007-of-00007.safetensors",
476
+ "model.vision_backbone.image_vit.transformer.resblocks.23.attn.out_proj.weight": "model-00007-of-00007.safetensors",
477
+ "model.vision_backbone.image_vit.transformer.resblocks.23.ln_1.bias": "model-00007-of-00007.safetensors",
478
+ "model.vision_backbone.image_vit.transformer.resblocks.23.ln_1.weight": "model-00007-of-00007.safetensors",
479
+ "model.vision_backbone.image_vit.transformer.resblocks.23.ln_2.bias": "model-00007-of-00007.safetensors",
480
+ "model.vision_backbone.image_vit.transformer.resblocks.23.ln_2.weight": "model-00007-of-00007.safetensors",
481
+ "model.vision_backbone.image_vit.transformer.resblocks.23.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
482
+ "model.vision_backbone.image_vit.transformer.resblocks.23.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
483
+ "model.vision_backbone.image_vit.transformer.resblocks.23.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
484
+ "model.vision_backbone.image_vit.transformer.resblocks.23.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
485
+ "model.vision_backbone.image_vit.transformer.resblocks.3.attn.in_proj_bias": "model-00006-of-00007.safetensors",
486
+ "model.vision_backbone.image_vit.transformer.resblocks.3.attn.in_proj_weight": "model-00006-of-00007.safetensors",
487
+ "model.vision_backbone.image_vit.transformer.resblocks.3.attn.out_proj.bias": "model-00006-of-00007.safetensors",
488
+ "model.vision_backbone.image_vit.transformer.resblocks.3.attn.out_proj.weight": "model-00006-of-00007.safetensors",
489
+ "model.vision_backbone.image_vit.transformer.resblocks.3.ln_1.bias": "model-00006-of-00007.safetensors",
490
+ "model.vision_backbone.image_vit.transformer.resblocks.3.ln_1.weight": "model-00006-of-00007.safetensors",
491
+ "model.vision_backbone.image_vit.transformer.resblocks.3.ln_2.bias": "model-00006-of-00007.safetensors",
492
+ "model.vision_backbone.image_vit.transformer.resblocks.3.ln_2.weight": "model-00006-of-00007.safetensors",
493
+ "model.vision_backbone.image_vit.transformer.resblocks.3.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
494
+ "model.vision_backbone.image_vit.transformer.resblocks.3.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
495
+ "model.vision_backbone.image_vit.transformer.resblocks.3.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
496
+ "model.vision_backbone.image_vit.transformer.resblocks.3.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
497
+ "model.vision_backbone.image_vit.transformer.resblocks.4.attn.in_proj_bias": "model-00007-of-00007.safetensors",
498
+ "model.vision_backbone.image_vit.transformer.resblocks.4.attn.in_proj_weight": "model-00007-of-00007.safetensors",
499
+ "model.vision_backbone.image_vit.transformer.resblocks.4.attn.out_proj.bias": "model-00007-of-00007.safetensors",
500
+ "model.vision_backbone.image_vit.transformer.resblocks.4.attn.out_proj.weight": "model-00007-of-00007.safetensors",
501
+ "model.vision_backbone.image_vit.transformer.resblocks.4.ln_1.bias": "model-00007-of-00007.safetensors",
502
+ "model.vision_backbone.image_vit.transformer.resblocks.4.ln_1.weight": "model-00007-of-00007.safetensors",
503
+ "model.vision_backbone.image_vit.transformer.resblocks.4.ln_2.bias": "model-00007-of-00007.safetensors",
504
+ "model.vision_backbone.image_vit.transformer.resblocks.4.ln_2.weight": "model-00007-of-00007.safetensors",
505
+ "model.vision_backbone.image_vit.transformer.resblocks.4.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
506
+ "model.vision_backbone.image_vit.transformer.resblocks.4.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
507
+ "model.vision_backbone.image_vit.transformer.resblocks.4.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
508
+ "model.vision_backbone.image_vit.transformer.resblocks.4.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
509
+ "model.vision_backbone.image_vit.transformer.resblocks.5.attn.in_proj_bias": "model-00007-of-00007.safetensors",
510
+ "model.vision_backbone.image_vit.transformer.resblocks.5.attn.in_proj_weight": "model-00007-of-00007.safetensors",
511
+ "model.vision_backbone.image_vit.transformer.resblocks.5.attn.out_proj.bias": "model-00007-of-00007.safetensors",
512
+ "model.vision_backbone.image_vit.transformer.resblocks.5.attn.out_proj.weight": "model-00007-of-00007.safetensors",
513
+ "model.vision_backbone.image_vit.transformer.resblocks.5.ln_1.bias": "model-00007-of-00007.safetensors",
514
+ "model.vision_backbone.image_vit.transformer.resblocks.5.ln_1.weight": "model-00007-of-00007.safetensors",
515
+ "model.vision_backbone.image_vit.transformer.resblocks.5.ln_2.bias": "model-00007-of-00007.safetensors",
516
+ "model.vision_backbone.image_vit.transformer.resblocks.5.ln_2.weight": "model-00007-of-00007.safetensors",
517
+ "model.vision_backbone.image_vit.transformer.resblocks.5.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
518
+ "model.vision_backbone.image_vit.transformer.resblocks.5.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
519
+ "model.vision_backbone.image_vit.transformer.resblocks.5.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
520
+ "model.vision_backbone.image_vit.transformer.resblocks.5.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
521
+ "model.vision_backbone.image_vit.transformer.resblocks.6.attn.in_proj_bias": "model-00007-of-00007.safetensors",
522
+ "model.vision_backbone.image_vit.transformer.resblocks.6.attn.in_proj_weight": "model-00007-of-00007.safetensors",
523
+ "model.vision_backbone.image_vit.transformer.resblocks.6.attn.out_proj.bias": "model-00007-of-00007.safetensors",
524
+ "model.vision_backbone.image_vit.transformer.resblocks.6.attn.out_proj.weight": "model-00007-of-00007.safetensors",
525
+ "model.vision_backbone.image_vit.transformer.resblocks.6.ln_1.bias": "model-00007-of-00007.safetensors",
526
+ "model.vision_backbone.image_vit.transformer.resblocks.6.ln_1.weight": "model-00007-of-00007.safetensors",
527
+ "model.vision_backbone.image_vit.transformer.resblocks.6.ln_2.bias": "model-00007-of-00007.safetensors",
528
+ "model.vision_backbone.image_vit.transformer.resblocks.6.ln_2.weight": "model-00007-of-00007.safetensors",
529
+ "model.vision_backbone.image_vit.transformer.resblocks.6.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
530
+ "model.vision_backbone.image_vit.transformer.resblocks.6.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
531
+ "model.vision_backbone.image_vit.transformer.resblocks.6.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
532
+ "model.vision_backbone.image_vit.transformer.resblocks.6.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
533
+ "model.vision_backbone.image_vit.transformer.resblocks.7.attn.in_proj_bias": "model-00007-of-00007.safetensors",
534
+ "model.vision_backbone.image_vit.transformer.resblocks.7.attn.in_proj_weight": "model-00007-of-00007.safetensors",
535
+ "model.vision_backbone.image_vit.transformer.resblocks.7.attn.out_proj.bias": "model-00007-of-00007.safetensors",
536
+ "model.vision_backbone.image_vit.transformer.resblocks.7.attn.out_proj.weight": "model-00007-of-00007.safetensors",
537
+ "model.vision_backbone.image_vit.transformer.resblocks.7.ln_1.bias": "model-00007-of-00007.safetensors",
538
+ "model.vision_backbone.image_vit.transformer.resblocks.7.ln_1.weight": "model-00007-of-00007.safetensors",
539
+ "model.vision_backbone.image_vit.transformer.resblocks.7.ln_2.bias": "model-00007-of-00007.safetensors",
540
+ "model.vision_backbone.image_vit.transformer.resblocks.7.ln_2.weight": "model-00007-of-00007.safetensors",
541
+ "model.vision_backbone.image_vit.transformer.resblocks.7.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
542
+ "model.vision_backbone.image_vit.transformer.resblocks.7.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
543
+ "model.vision_backbone.image_vit.transformer.resblocks.7.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
544
+ "model.vision_backbone.image_vit.transformer.resblocks.7.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
545
+ "model.vision_backbone.image_vit.transformer.resblocks.8.attn.in_proj_bias": "model-00007-of-00007.safetensors",
546
+ "model.vision_backbone.image_vit.transformer.resblocks.8.attn.in_proj_weight": "model-00007-of-00007.safetensors",
547
+ "model.vision_backbone.image_vit.transformer.resblocks.8.attn.out_proj.bias": "model-00007-of-00007.safetensors",
548
+ "model.vision_backbone.image_vit.transformer.resblocks.8.attn.out_proj.weight": "model-00007-of-00007.safetensors",
549
+ "model.vision_backbone.image_vit.transformer.resblocks.8.ln_1.bias": "model-00007-of-00007.safetensors",
550
+ "model.vision_backbone.image_vit.transformer.resblocks.8.ln_1.weight": "model-00007-of-00007.safetensors",
551
+ "model.vision_backbone.image_vit.transformer.resblocks.8.ln_2.bias": "model-00007-of-00007.safetensors",
552
+ "model.vision_backbone.image_vit.transformer.resblocks.8.ln_2.weight": "model-00007-of-00007.safetensors",
553
+ "model.vision_backbone.image_vit.transformer.resblocks.8.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
554
+ "model.vision_backbone.image_vit.transformer.resblocks.8.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
555
+ "model.vision_backbone.image_vit.transformer.resblocks.8.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
556
+ "model.vision_backbone.image_vit.transformer.resblocks.8.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
557
+ "model.vision_backbone.image_vit.transformer.resblocks.9.attn.in_proj_bias": "model-00007-of-00007.safetensors",
558
+ "model.vision_backbone.image_vit.transformer.resblocks.9.attn.in_proj_weight": "model-00007-of-00007.safetensors",
559
+ "model.vision_backbone.image_vit.transformer.resblocks.9.attn.out_proj.bias": "model-00007-of-00007.safetensors",
560
+ "model.vision_backbone.image_vit.transformer.resblocks.9.attn.out_proj.weight": "model-00007-of-00007.safetensors",
561
+ "model.vision_backbone.image_vit.transformer.resblocks.9.ln_1.bias": "model-00007-of-00007.safetensors",
562
+ "model.vision_backbone.image_vit.transformer.resblocks.9.ln_1.weight": "model-00007-of-00007.safetensors",
563
+ "model.vision_backbone.image_vit.transformer.resblocks.9.ln_2.bias": "model-00007-of-00007.safetensors",
564
+ "model.vision_backbone.image_vit.transformer.resblocks.9.ln_2.weight": "model-00007-of-00007.safetensors",
565
+ "model.vision_backbone.image_vit.transformer.resblocks.9.mlp.c_fc.bias": "model-00007-of-00007.safetensors",
566
+ "model.vision_backbone.image_vit.transformer.resblocks.9.mlp.c_fc.weight": "model-00007-of-00007.safetensors",
567
+ "model.vision_backbone.image_vit.transformer.resblocks.9.mlp.c_proj.bias": "model-00007-of-00007.safetensors",
568
+ "model.vision_backbone.image_vit.transformer.resblocks.9.mlp.c_proj.weight": "model-00007-of-00007.safetensors",
569
+ "model.vision_backbone.pad_embed": "model-00006-of-00007.safetensors"
570
+ }
571
+ }
modeling_patram.py ADDED
The diff for this file is too large to render. See raw diff
 
preprocessing_patram.py ADDED
@@ -0,0 +1,192 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Processor class for Patram.
3
+ """
4
+
5
+ from typing import Optional
6
+
7
+ import PIL
8
+ from PIL import ImageOps
9
+ from PIL.Image import Image
10
+
11
+ try:
12
+ from typing import Unpack
13
+ except ImportError:
14
+ from typing_extensions import Unpack
15
+
16
+ import numpy as np
17
+ import torch
18
+
19
+ from transformers.image_utils import ImageInput
20
+ from transformers.processing_utils import (
21
+ TextKwargs,
22
+ ProcessingKwargs,
23
+ ProcessorMixin,
24
+ )
25
+
26
+ from transformers.tokenization_utils_base import TextInput, PreTokenizedInput
27
+ from transformers.utils import logging
28
+
29
+ from transformers import AutoTokenizer
30
+ from .image_preprocessing_patram import PatramImagesKwargs, PatramImageProcessor
31
+
32
+
33
+ logger = logging.get_logger(__name__)
34
+
35
+
36
+ DEFAULT_IMAGE_PATCH_TOKEN = f"<im_patch>"
37
+ DEFAULT_IM_START_TOKEN = f"<im_start>"
38
+ DEFAULT_IM_END_TOKEN = f"<im_end>"
39
+ DEFAULT_IM_COL_TOKEN = f"<im_col>"
40
+ IMAGE_PROMPT = "<|image|>"
41
+
42
+ EXTRA_TOKENS = (DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN, DEFAULT_IMAGE_PATCH_TOKEN, DEFAULT_IM_COL_TOKEN, IMAGE_PROMPT)
43
+
44
+
45
+ def get_special_token_ids(tokenizer):
46
+ ids = tokenizer.encode("".join(EXTRA_TOKENS), add_special_tokens=False)
47
+ assert len(ids) == len(EXTRA_TOKENS)
48
+ return {k: i for k, i in zip(EXTRA_TOKENS, ids)}
49
+
50
+
51
+ class PatramTextKwargs(TextKwargs, total=False):
52
+ style: Optional[str]
53
+ system_prompt: Optional[str]
54
+ message_format: Optional[str]
55
+ always_start_with_space: Optional[bool]
56
+ sequence_length: Optional[int]
57
+
58
+
59
+ class PatramProcessorKwargs(ProcessingKwargs, total=False):
60
+ text_kwargs: PatramTextKwargs
61
+ images_kwargs: PatramImagesKwargs
62
+ _defaults = {
63
+ "images_kwargs": {
64
+ "max_crops": 12,
65
+ "overlap_margins": [4, 4],
66
+ "base_image_input_size": [336, 336],
67
+ "image_token_length_w": 12,
68
+ "image_token_length_h": 12,
69
+ "image_patch_size": 14,
70
+ "image_padding_mask": True,
71
+ },
72
+ "text_kwargs": {
73
+ "style": "long_caption",
74
+ "system_prompt": "none",
75
+ "message_format": "role",
76
+ "always_start_with_space": True,
77
+ "sequence_length": 1536,
78
+ "padding": False,
79
+ },
80
+ }
81
+
82
+
83
+ class PatramProcessor(ProcessorMixin):
84
+ attributes = ["image_processor", "tokenizer"]
85
+ image_processor_class = "AutoImageProcessor"
86
+ tokenizer_class = ("GPT2Tokenizer", "GPT2TokenizerFast")
87
+
88
+ def __init__(self, image_processor: PatramImageProcessor = None, tokenizer : AutoTokenizer = None, **kwargs):
89
+ # self.image_processor = image_processor
90
+ # self.tokenizer = tokenizer
91
+ super().__init__(image_processor, tokenizer)
92
+ self._special_tokens = None
93
+
94
+ @property
95
+ def special_token_ids(self):
96
+ if self._special_tokens is None:
97
+ self._special_tokens = get_special_token_ids(self.tokenizer)
98
+ return self._special_tokens
99
+
100
+ def get_tokens_input(self, prompt, message_format, always_start_with_space):
101
+ if message_format == "none" or message_format is None:
102
+ pass
103
+ elif message_format == "role":
104
+ prompt = "User: " + prompt + " Assistant:"
105
+ else:
106
+ raise NotImplementedError(f"Message format {message_format} not implemented")
107
+
108
+ if always_start_with_space:
109
+ prompt = " " + prompt
110
+
111
+ tokens = self.tokenizer.encode(prompt, add_special_tokens=False)
112
+
113
+ return tokens
114
+
115
+ def process(
116
+ self,
117
+ text: TextInput = None,
118
+ images: ImageInput = None,
119
+ *,
120
+ tokens: Optional[PreTokenizedInput] = None,
121
+ **kwargs: Unpack[PatramProcessorKwargs],
122
+ ):
123
+ output_kwargs = self._merge_kwargs(
124
+ PatramProcessorKwargs,
125
+ tokenizer_init_kwargs=self.tokenizer.init_kwargs,
126
+ **kwargs,
127
+ )
128
+
129
+ if tokens is None:
130
+ tokens = self.get_tokens_input(
131
+ text,
132
+ output_kwargs["text_kwargs"]["message_format"],
133
+ output_kwargs["text_kwargs"]["always_start_with_space"],
134
+ )
135
+
136
+ image_token_id = self.special_token_ids[IMAGE_PROMPT]
137
+
138
+ if images is not None:
139
+ if not isinstance(images, (list, tuple)):
140
+ images = [images]
141
+ image_arrays = []
142
+ for image in images:
143
+ if isinstance(image, Image):
144
+ image = image.convert("RGB")
145
+ # Handle images with EXIF orientation tags, which PIL will ignore by default
146
+ # https://github.com/python-pillow/Pillow/issues/4703
147
+ img = ImageOps.exif_transpose(image)
148
+ image_arrays.append(np.array(image))
149
+ else:
150
+ assert len(image.shape) == 3 and image.shape[-1] == 3
151
+ image_arrays.append(image.astype(np.uint8))
152
+ images = image_arrays
153
+ # For now only support inserting images at the start
154
+ image_idx = [-1]*len(images)
155
+ else:
156
+ image_idx = None
157
+
158
+ sequence_length = output_kwargs["text_kwargs"]["sequence_length"]
159
+
160
+ image_patch_token_id = self.special_token_ids[DEFAULT_IMAGE_PATCH_TOKEN]
161
+ image_col_token_id = self.special_token_ids[DEFAULT_IM_COL_TOKEN]
162
+ image_start_token_id = self.special_token_ids[DEFAULT_IM_START_TOKEN]
163
+ image_end_token_id = self.special_token_ids[DEFAULT_IM_END_TOKEN]
164
+ out = self.image_processor.multimodal_preprocess(
165
+ images=images,
166
+ image_idx=image_idx,
167
+ tokens=np.asarray(tokens).astype(np.int32),
168
+ sequence_length=sequence_length,
169
+ image_patch_token_id=image_patch_token_id,
170
+ image_col_token_id=image_col_token_id,
171
+ image_start_token_id=image_start_token_id,
172
+ image_end_token_id=image_end_token_id,
173
+ **output_kwargs["images_kwargs"]
174
+ )
175
+
176
+ # Prepend BOS
177
+ # qwen2 and olmo do not have a BOS, and instead use EOS as a generic seperator token.
178
+ bos = self.tokenizer.bos_token_id or self.tokenizer.eos_token_id
179
+ decoder_input_tokens = np.pad(out["input_ids"], [[1, 0]], constant_values=bos)
180
+ out["input_ids"] = decoder_input_tokens
181
+ if "image_input_idx" in out:
182
+ # Shift patch mapping up by one since we added BOS
183
+ image_input_idx = out["image_input_idx"]
184
+ out["image_input_idx"] = np.where(image_input_idx < 0, image_input_idx, image_input_idx + 1)
185
+
186
+ for k, v in out.items():
187
+ out[k] = torch.from_numpy(v)
188
+
189
+ return out
190
+
191
+
192
+ PatramProcessor.register_for_auto_class()
preprocessor_config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "auto_map": {
3
+ "AutoImageProcessor": "image_preprocessing_patram.PatramImageProcessor",
4
+ "AutoProcessor": "preprocessing_patram.PatramProcessor"
5
+ },
6
+ "base_image_input_size": [
7
+ 336,
8
+ 336
9
+ ],
10
+ "do_normalize": true,
11
+ "image_mean": [
12
+ 0.48145466,
13
+ 0.4578275,
14
+ 0.40821073
15
+ ],
16
+ "image_padding_mask": true,
17
+ "image_patch_size": 14,
18
+ "image_processor_type": "PatramImageProcessor",
19
+ "image_std": [
20
+ 0.26862954,
21
+ 0.26130258,
22
+ 0.27577711
23
+ ],
24
+ "image_token_length_h": 12,
25
+ "image_token_length_w": 12,
26
+ "max_crops": 12,
27
+ "overlap_margins": [
28
+ 4,
29
+ 4
30
+ ],
31
+ "processor_class": "PatramProcessor"
32
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<im_start>",
4
+ "<im_end>",
5
+ "<im_patch>",
6
+ "<im_col>",
7
+ "<|image|>"
8
+ ],
9
+ "bos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "<|endoftext|>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "pad_token": {
24
+ "content": "<|pad|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "<|endoftext|>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "100256": {
5
+ "content": "<|extra_id_0|>",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": false
11
+ },
12
+ "100257": {
13
+ "content": "<|endoftext|>",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "100258": {
21
+ "content": "<|fim_prefix|>",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "100259": {
29
+ "content": "<|fim_middle|>",
30
+ "lstrip": false,
31
+ "normalized": false,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "100260": {
37
+ "content": "<|fim_suffix|>",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "100261": {
45
+ "content": "|||PHONE_NUMBER|||",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": false
51
+ },
52
+ "100262": {
53
+ "content": "|||EMAIL_ADDRESS|||",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": false
59
+ },
60
+ "100263": {
61
+ "content": "|||IP_ADDRESS|||",
62
+ "lstrip": false,
63
+ "normalized": false,
64
+ "rstrip": false,
65
+ "single_word": false,
66
+ "special": false
67
+ },
68
+ "100264": {
69
+ "content": "<|im_start|>",
70
+ "lstrip": false,
71
+ "normalized": false,
72
+ "rstrip": false,
73
+ "single_word": false,
74
+ "special": true
75
+ },
76
+ "100265": {
77
+ "content": "<|im_end|>",
78
+ "lstrip": false,
79
+ "normalized": false,
80
+ "rstrip": false,
81
+ "single_word": false,
82
+ "special": true
83
+ },
84
+ "100266": {
85
+ "content": "<|extra_id_1|>",
86
+ "lstrip": false,
87
+ "normalized": false,
88
+ "rstrip": false,
89
+ "single_word": false,
90
+ "special": false
91
+ },
92
+ "100267": {
93
+ "content": "<|extra_id_2|>",
94
+ "lstrip": false,
95
+ "normalized": false,
96
+ "rstrip": false,
97
+ "single_word": false,
98
+ "special": false
99
+ },
100
+ "100268": {
101
+ "content": "<|extra_id_3|>",
102
+ "lstrip": false,
103
+ "normalized": false,
104
+ "rstrip": false,
105
+ "single_word": false,
106
+ "special": false
107
+ },
108
+ "100269": {
109
+ "content": "<|extra_id_4|>",
110
+ "lstrip": false,
111
+ "normalized": false,
112
+ "rstrip": false,
113
+ "single_word": false,
114
+ "special": false
115
+ },
116
+ "100270": {
117
+ "content": "<|extra_id_5|>",
118
+ "lstrip": false,
119
+ "normalized": false,
120
+ "rstrip": false,
121
+ "single_word": false,
122
+ "special": false
123
+ },
124
+ "100271": {
125
+ "content": "<|extra_id_6|>",
126
+ "lstrip": false,
127
+ "normalized": false,
128
+ "rstrip": false,
129
+ "single_word": false,
130
+ "special": false
131
+ },
132
+ "100272": {
133
+ "content": "<|extra_id_7|>",
134
+ "lstrip": false,
135
+ "normalized": false,
136
+ "rstrip": false,
137
+ "single_word": false,
138
+ "special": false
139
+ },
140
+ "100273": {
141
+ "content": "<|extra_id_8|>",
142
+ "lstrip": false,
143
+ "normalized": false,
144
+ "rstrip": false,
145
+ "single_word": false,
146
+ "special": false
147
+ },
148
+ "100274": {
149
+ "content": "<|extra_id_9|>",
150
+ "lstrip": false,
151
+ "normalized": false,
152
+ "rstrip": false,
153
+ "single_word": false,
154
+ "special": false
155
+ },
156
+ "100275": {
157
+ "content": "<|extra_id_10|>",
158
+ "lstrip": false,
159
+ "normalized": false,
160
+ "rstrip": false,
161
+ "single_word": false,
162
+ "special": false
163
+ },
164
+ "100276": {
165
+ "content": "<|endofprompt|>",
166
+ "lstrip": false,
167
+ "normalized": false,
168
+ "rstrip": false,
169
+ "single_word": false,
170
+ "special": true
171
+ },
172
+ "100277": {
173
+ "content": "<|pad|>",
174
+ "lstrip": false,
175
+ "normalized": false,
176
+ "rstrip": false,
177
+ "single_word": false,
178
+ "special": true
179
+ },
180
+ "100278": {
181
+ "content": "<im_start>",
182
+ "lstrip": false,
183
+ "normalized": false,
184
+ "rstrip": false,
185
+ "single_word": false,
186
+ "special": true
187
+ },
188
+ "100279": {
189
+ "content": "<im_end>",
190
+ "lstrip": false,
191
+ "normalized": false,
192
+ "rstrip": false,
193
+ "single_word": false,
194
+ "special": true
195
+ },
196
+ "100280": {
197
+ "content": "<im_patch>",
198
+ "lstrip": false,
199
+ "normalized": false,
200
+ "rstrip": false,
201
+ "single_word": false,
202
+ "special": true
203
+ },
204
+ "100281": {
205
+ "content": "<im_col>",
206
+ "lstrip": false,
207
+ "normalized": false,
208
+ "rstrip": false,
209
+ "single_word": false,
210
+ "special": true
211
+ },
212
+ "100282": {
213
+ "content": "<|image|>",
214
+ "lstrip": false,
215
+ "normalized": false,
216
+ "rstrip": false,
217
+ "single_word": false,
218
+ "special": true
219
+ }
220
+ },
221
+ "additional_special_tokens": [
222
+ "<im_start>",
223
+ "<im_end>",
224
+ "<im_patch>",
225
+ "<im_col>",
226
+ "<|image|>"
227
+ ],
228
+ "auto_map": {
229
+ "AutoProcessor": "preprocessing_patram.PatramProcessor"
230
+ },
231
+ "bos_token": "<|endoftext|>",
232
+ "clean_up_tokenization_spaces": false,
233
+ "eos_token": "<|endoftext|>",
234
+ "extra_special_tokens": {},
235
+ "model_max_length": 8192,
236
+ "pad_token": "<|pad|>",
237
+ "processor_class": "PatramProcessor",
238
+ "tokenizer_class": "GPT2Tokenizer",
239
+ "unk_token": "<|endoftext|>"
240
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff