div0-space commited on
Commit
c6c3a3b
·
verified ·
1 Parent(s): f637916

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - pl
6
+ tags:
7
+ - mlx
8
+ - colbert
9
+ - visual-retrieval
10
+ - document-understanding
11
+ - apple-silicon
12
+ - qwen3-vl
13
+ base_model: tomoro-ai/Colqwen3-8B-base
14
+ pipeline_tag: image-text-retrieval
15
+ library_name: mlx
16
+ ---
17
+
18
+ # ColQwen3 8B - Power of Wet Coders Edition
19
+
20
+ **Visual document retrieval model** with ColBERT-style late interaction (MaxSim scoring), optimized for Apple Silicon via MLX.
21
+
22
+ Created by M&K (c)2025 The LibraxisAI Team
23
+
24
+ ## Model Description
25
+
26
+ ColQwen3 is a custom model merged from 3 foundation models, designed for:
27
+ - **Visual document retrieval** - find relevant pages in PDF documents
28
+ - **Late interaction ranking** - ColBERT-style MaxSim scoring for precision
29
+ - **Multi-modal embeddings** - embed both images and text queries
30
+
31
+ ### Architecture
32
+
33
+ ```
34
+ Query: "financial report Q3"
35
+
36
+
37
+ ┌─────────────────────────────┐
38
+ │ ColQwen3 Text Encoder │
39
+ │ → Query embeddings [N×D] │
40
+ └──────────────┬──────────────┘
41
+
42
+
43
+ ┌─────────────────────────────┐
44
+ │ MaxSim Late Interaction │
45
+ │ max(sim(q_i, d_j)) for all │
46
+ │ query tokens vs doc tokens │
47
+ └──────────────┬──────────────┘
48
+
49
+
50
+ ┌─────────────────────────────┐
51
+ │ Projection Layer (128D/320D)│
52
+ │ → Compact representations │
53
+ └──────────────┬──────────────┘
54
+
55
+ Ranked Documents
56
+ ```
57
+
58
+ ## Usage
59
+
60
+ ### With MLX (Apple Silicon)
61
+
62
+ ```python
63
+ from colqwen3_embedder import ColQwen3Embedder
64
+
65
+ # Initialize embedder
66
+ embedder = ColQwen3Embedder(
67
+ model_path="libraxisai/colqwen3-8b-wetcoders",
68
+ projection_path="projections/projection_320d.safetensors"
69
+ )
70
+
71
+ # Embed a query
72
+ query_emb = embedder.embed_query("financial report Q3 2024")
73
+
74
+ # Embed a document page (image)
75
+ from PIL import Image
76
+ page_image = Image.open("document_page.png")
77
+ doc_emb = embedder.embed_image(page_image)
78
+
79
+ # Compute MaxSim score
80
+ score = embedder.maxsim(query_emb, doc_emb)
81
+ print(f"Relevance score: {score:.4f}")
82
+ ```
83
+
84
+ ### HTTP Server
85
+
86
+ ```bash
87
+ # Start the server
88
+ python scripts/mlx_visual_server.py --port 12347
89
+
90
+ # Generate embeddings
91
+ curl -X POST http://localhost:12347/v1/visual-embeddings \
92
+ -H "Content-Type: application/json" \
93
+ -d '{"input": "financial report", "type": "query"}'
94
+
95
+ # Compute MaxSim
96
+ curl -X POST http://localhost:12347/v1/maxsim \
97
+ -H "Content-Type: application/json" \
98
+ -d '{"query_embedding": [...], "document_embedding": [...]}'
99
+ ```
100
+
101
+ ## Package Contents
102
+
103
+ ```
104
+ colqwen3-8b-wetcoders/
105
+ ├── config.json # Model configuration
106
+ ├── model-*.safetensors # 7 shards (~35GB total)
107
+ ├── model.safetensors.index.json # Shard index
108
+ ├── tokenizer.json # Tokenizer
109
+ ├── tokenizer_config.json
110
+ ├── vocab.json
111
+ ├── preprocessor_config.json # Image preprocessing
112
+ ├── video_preprocessor_config.json
113
+ ├── projections/
114
+ │ ├── projection_128d.safetensors # Fast, lower quality (~5MB)
115
+ │ └── projection_320d.safetensors # Better quality (~2.6MB)
116
+ └── scripts/
117
+ ├── colqwen3_embedder.py # Main embedder class
118
+ └── mlx_visual_server.py # HTTP server
119
+ ```
120
+
121
+ ## Projection Dimensions
122
+
123
+ | Projection | Size | Speed | Quality | Use Case |
124
+ |------------|------|-------|---------|----------|
125
+ | 128D | 5.2 MB | Fast | Good | Real-time search |
126
+ | 320D | 2.6 MB | Medium | Better | Batch indexing |
127
+
128
+ ## Performance
129
+
130
+ Tested on Apple M3 Ultra (512GB RAM):
131
+
132
+ | Metric | Value |
133
+ |--------|-------|
134
+ | Query embedding | ~15ms |
135
+ | Image embedding | ~150ms |
136
+ | MaxSim (1000 docs) | ~5ms |
137
+ | VRAM usage | ~18GB |
138
+
139
+ ## Training
140
+
141
+ This model was created by merging:
142
+ 1. tomoro-ai/Colqwen3-8B-base
143
+ 2. Custom projection training on document retrieval datasets
144
+ 3. Fine-tuning for visual document understanding
145
+
146
+ Training data included:
147
+ - Scientific papers (arXiv)
148
+ - Financial documents
149
+ - Legal contracts
150
+ - Technical documentation
151
+
152
+ ## Limitations
153
+
154
+ - Requires Apple Silicon Mac with MLX
155
+ - Minimum 32GB RAM recommended
156
+ - Images should be at least 224×224 pixels
157
+ - Best results with document-style images (not photos)
158
+
159
+ ## Citation
160
+
161
+ ```bibtex
162
+ @misc{colqwen3-wetcoders-2025,
163
+ title={ColQwen3 8B - Power of Wet Coders Edition},
164
+ author={LibraxisAI Team},
165
+ year={2025},
166
+ publisher={HuggingFace},
167
+ url={https://huggingface.co/libraxisai/colqwen3-8b-wetcoders}
168
+ }
169
+ ```
170
+
171
+ ## License
172
+
173
+ Apache 2.0
174
+
175
+ ---
176
+
177
+ **Created by M&K (c)2025 The LibraxisAI Team**
178
+ **Co-Authored-By: [Maciej](void@div0.space) & [Klaudiusz](the1st@whoai.am)**
added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
config.json ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "qwen3_vl",
3
+ "architectures": [
4
+ "Qwen3VLForConditionalGeneration"
5
+ ],
6
+ "hidden_size": 4096,
7
+ "num_hidden_layers": 36,
8
+ "num_attention_heads": 32,
9
+ "num_key_value_heads": 8,
10
+ "intermediate_size": 12288,
11
+ "vocab_size": 151936,
12
+ "max_position_embeddings": 262144,
13
+ "rms_norm_eps": 1e-06,
14
+ "rope_theta": 5000000,
15
+ "rope_scaling": {
16
+ "mrope_interleaved": true,
17
+ "mrope_section": [
18
+ 24,
19
+ 20,
20
+ 20
21
+ ],
22
+ "rope_type": "default"
23
+ },
24
+ "hidden_act": "silu",
25
+ "attention_bias": false,
26
+ "text_config": {
27
+ "attention_bias": false,
28
+ "attention_dropout": 0.0,
29
+ "bos_token_id": 151643,
30
+ "dtype": "float32",
31
+ "eos_token_id": 151645,
32
+ "head_dim": 128,
33
+ "hidden_act": "silu",
34
+ "hidden_size": 4096,
35
+ "initializer_range": 0.02,
36
+ "intermediate_size": 12288,
37
+ "max_position_embeddings": 262144,
38
+ "model_type": "qwen3_vl_text",
39
+ "num_attention_heads": 32,
40
+ "num_hidden_layers": 36,
41
+ "num_key_value_heads": 8,
42
+ "rms_norm_eps": 1e-06,
43
+ "rope_scaling": {
44
+ "mrope_interleaved": true,
45
+ "mrope_section": [
46
+ 24,
47
+ 20,
48
+ 20
49
+ ],
50
+ "rope_type": "default"
51
+ },
52
+ "rope_theta": 5000000,
53
+ "use_cache": true,
54
+ "vocab_size": 151936
55
+ },
56
+ "vision_config": {
57
+ "deepstack_visual_indexes": [
58
+ 8,
59
+ 16,
60
+ 24
61
+ ],
62
+ "depth": 27,
63
+ "dtype": "float32",
64
+ "hidden_act": "gelu_pytorch_tanh",
65
+ "hidden_size": 1152,
66
+ "in_channels": 3,
67
+ "initializer_range": 0.02,
68
+ "intermediate_size": 4304,
69
+ "model_type": "qwen3_vl",
70
+ "num_heads": 16,
71
+ "num_position_embeddings": 2304,
72
+ "out_hidden_size": 4096,
73
+ "patch_size": 16,
74
+ "spatial_merge_size": 2,
75
+ "temporal_patch_size": 2
76
+ },
77
+ "image_token_index": 151655,
78
+ "video_token_index": 151656,
79
+ "embedding_dim": 320
80
+ }
docs/ARCHITECTURE.md ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ColQwen3 Architecture
2
+
3
+ **Created by M&K (c)2025 The LibraxisAI Team**
4
+
5
+ ## Model Origins
6
+
7
+ ColQwen3 8B is based on the ColBERT late interaction paradigm, adapted for visual document retrieval using Qwen3-VL as the backbone.
8
+
9
+ ### Base Models Merged
10
+
11
+ 1. **tomoro-ai/Colqwen3-8B-base** - Foundation visual-language model
12
+ 2. **Custom projection layers** - Trained for document embedding
13
+ 3. **Visual processor** - Qwen3-VL image understanding
14
+
15
+ ## Late Interaction (MaxSim)
16
+
17
+ Unlike dense retrievers that produce single vectors, ColBERT-style models produce **token-level embeddings**:
18
+
19
+ ```
20
+ Query: "financial report"
21
+
22
+ [emb_financial, emb_report] # N query tokens
23
+
24
+ Document Page:
25
+
26
+ [emb_Q3, emb_revenue, emb_chart, ...] # M document tokens
27
+
28
+ MaxSim Score = Σ max(sim(q_i, d_j)) for all j
29
+ = sum of best matches for each query token
30
+ ```
31
+
32
+ This enables:
33
+ - **Fine-grained matching** - individual terms matter
34
+ - **Passage-level relevance** - not just document-level
35
+ - **Interpretable scores** - which terms matched
36
+
37
+ ## Projection Layers
38
+
39
+ Raw embeddings from Qwen3-VL are 4096-dimensional. We project them down for efficiency:
40
+
41
+ | Layer | Input Dim | Output Dim | Parameters |
42
+ |-------|-----------|------------|------------|
43
+ | 128D | 4096 | 128 | 524K |
44
+ | 320D | 4096 | 320 | 1.3M |
45
+
46
+ ### When to Use Each
47
+
48
+ - **128D**: Real-time search, memory-constrained
49
+ - **320D**: Batch indexing, quality-critical applications
50
+
51
+ ## Image Processing Pipeline
52
+
53
+ ```
54
+ PDF Page / Image
55
+
56
+
57
+ ┌─────────────────────────────┐
58
+ │ Resize to 1024×1024 max │
59
+ │ (preserve aspect ratio) │
60
+ └──────────────┬──────────────┘
61
+
62
+
63
+ ┌─────────────────────────────┐
64
+ │ Qwen3-VL Vision Encoder │
65
+ │ Patch embedding + attention │
66
+ └──────────────┬──────────────┘
67
+
68
+
69
+ ┌─────────────────────────────┐
70
+ │ <|image_pad|> token expand │
71
+ │ → Token-level embeddings │
72
+ └──────────────┬──────────────┘
73
+
74
+
75
+ ┌─────────────────────────────┐
76
+ │ Projection Layer │
77
+ │ 4096D → 128D/320D │
78
+ └──────────────┬──────────────┘
79
+
80
+ Document Embedding
81
+ [num_patches × dim]
82
+ ```
83
+
84
+ ## Query Processing
85
+
86
+ Text queries go through the language model only:
87
+
88
+ ```
89
+ Query Text
90
+
91
+
92
+ ┌─────────────────────────────┐
93
+ │ Tokenizer │
94
+ │ → Token IDs │
95
+ └──────────────┬──────────────┘
96
+
97
+
98
+ ┌─────────────────────────────┐
99
+ │ Qwen3-VL Text Encoder │
100
+ │ → Hidden states │
101
+ └──────────────┬──────────────┘
102
+
103
+
104
+ ┌─────────────────────────────┐
105
+ │ Projection Layer │
106
+ │ 4096D → 128D/320D │
107
+ └──────────────┬──────────────┘
108
+
109
+ Query Embedding
110
+ [num_tokens × dim]
111
+ ```
112
+
113
+ ## Memory Layout
114
+
115
+ On Apple Silicon (MLX):
116
+
117
+ ```
118
+ ┌─────────────────────────────────────┐
119
+ │ Unified Memory │
120
+ ├─────────────────────────────────────┤
121
+ │ Model weights ~17GB │
122
+ │ KV Cache ~1-2GB │
123
+ │ Projection layers ~5MB │
124
+ │ Working memory ~1GB │
125
+ ├─────────────────────────────────────┤
126
+ │ Total ~18-20GB │
127
+ └─────────────────────────────────────┘
128
+ ```
129
+
130
+ ## Indexing Strategy
131
+
132
+ For production deployment:
133
+
134
+ 1. **Pre-compute document embeddings** (offline)
135
+ 2. **Store in vector database** (LanceDB, Qdrant, etc.)
136
+ 3. **Online query embedding** (real-time)
137
+ 4. **MaxSim scoring** (can be batched)
138
+
139
+ ```python
140
+ # Indexing (offline)
141
+ for page in pdf_pages:
142
+ embedding = embedder.embed_image(page)
143
+ vector_db.insert(doc_id, page_num, embedding)
144
+
145
+ # Search (online)
146
+ query_emb = embedder.embed_query(query_text)
147
+ candidates = vector_db.search(query_emb, k=100)
148
+ scores = [embedder.maxsim(query_emb, doc_emb) for doc_emb in candidates]
149
+ ```
150
+
151
+ ## File Format
152
+
153
+ Model weights use MLX-compatible safetensors:
154
+
155
+ ```
156
+ model-00001-of-00007.safetensors # 5.0GB
157
+ model-00002-of-00007.safetensors # 4.9GB
158
+ model-00003-of-00007.safetensors # 4.8GB
159
+ model-00004-of-00007.safetensors # 4.8GB
160
+ model-00005-of-00007.safetensors # 5.0GB
161
+ model-00006-of-00007.safetensors # 5.0GB
162
+ model-00007-of-00007.safetensors # 3.2GB
163
+ --------
164
+ Total: ~35GB
165
+ ```
166
+
167
+ Projection layers are separate safetensors files for flexibility.
168
+
169
+ ---
170
+
171
+ **Co-Authored-By: [Maciej](void@div0.space) & [Klaudiusz](the1st@whoai.am)**
model-00001-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab09fd9fa07cec9b300802b83dd95074ff1f3bed764fef884388e4e308f36d72
3
+ size 5324807856
model-00002-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8710e5c76cab16af5402547c27be32d28cc97c1b390b0a2d43df337fbb43d3d6
3
+ size 5291253768
model-00003-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50a13633482fcdb090b4fb70bf132eeb25b2cce28bd7e464153df3c573c02997
3
+ size 5191381840
model-00004-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5d4beed04fcd33e3d82e3d8f7bcb875ca5d2de59b69d6c86653b1eb0a3185f45
3
+ size 5201183352
model-00005-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:184037ab41850fa49d3a025550bb77a1f03da3284c6f9557a211be0bfef1fb58
3
+ size 5318640216
model-00006-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3e585e5070be3d7313b8ef0dff8fbf9606d6dc7f1aff0d86b897320fa85a03c
3
+ size 5335400432
model-00007-of-00007.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b32ec846ba78786aff551e23a85ee2e867d4fc48c2c294001e76edad37fe3e2
3
+ size 3405918536
model.safetensors.index.json ADDED
@@ -0,0 +1,757 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 35068494784
4
+ },
5
+ "weight_map": {
6
+ "language_model.lm_head.weight": "model-00001-of-00007.safetensors",
7
+ "language_model.model.layers.32.input_layernorm.weight": "model-00001-of-00007.safetensors",
8
+ "language_model.model.layers.32.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
9
+ "language_model.model.layers.32.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
10
+ "language_model.model.layers.32.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
11
+ "language_model.model.layers.32.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
12
+ "language_model.model.layers.33.input_layernorm.weight": "model-00001-of-00007.safetensors",
13
+ "language_model.model.layers.33.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
14
+ "language_model.model.layers.33.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
15
+ "language_model.model.layers.33.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
16
+ "language_model.model.layers.33.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
17
+ "language_model.model.layers.33.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
18
+ "language_model.model.layers.33.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
19
+ "language_model.model.layers.33.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
20
+ "language_model.model.layers.33.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
21
+ "language_model.model.layers.33.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
22
+ "language_model.model.layers.33.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
23
+ "language_model.model.layers.34.input_layernorm.weight": "model-00001-of-00007.safetensors",
24
+ "language_model.model.layers.34.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
25
+ "language_model.model.layers.34.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
26
+ "language_model.model.layers.34.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
27
+ "language_model.model.layers.34.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
28
+ "language_model.model.layers.34.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
29
+ "language_model.model.layers.34.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
30
+ "language_model.model.layers.34.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
31
+ "language_model.model.layers.34.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
32
+ "language_model.model.layers.34.self_attn.q_proj.weight": "model-00001-of-00007.safetensors",
33
+ "language_model.model.layers.34.self_attn.v_proj.weight": "model-00001-of-00007.safetensors",
34
+ "language_model.model.layers.35.input_layernorm.weight": "model-00001-of-00007.safetensors",
35
+ "language_model.model.layers.35.mlp.down_proj.weight": "model-00001-of-00007.safetensors",
36
+ "language_model.model.layers.35.mlp.gate_proj.weight": "model-00001-of-00007.safetensors",
37
+ "language_model.model.layers.35.mlp.up_proj.weight": "model-00001-of-00007.safetensors",
38
+ "language_model.model.layers.35.post_attention_layernorm.weight": "model-00001-of-00007.safetensors",
39
+ "language_model.model.layers.35.self_attn.k_norm.weight": "model-00001-of-00007.safetensors",
40
+ "language_model.model.layers.35.self_attn.k_proj.weight": "model-00001-of-00007.safetensors",
41
+ "language_model.model.layers.35.self_attn.o_proj.weight": "model-00001-of-00007.safetensors",
42
+ "language_model.model.layers.35.self_attn.q_norm.weight": "model-00001-of-00007.safetensors",
43
+ "language_model.model.layers.35.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
44
+ "language_model.model.layers.35.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
45
+ "language_model.model.norm.weight": "model-00002-of-00007.safetensors",
46
+ "language_model.model.embed_tokens.weight": "model-00002-of-00007.safetensors",
47
+ "language_model.model.layers.0.input_layernorm.weight": "model-00002-of-00007.safetensors",
48
+ "language_model.model.layers.0.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
49
+ "language_model.model.layers.0.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
50
+ "language_model.model.layers.0.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
51
+ "language_model.model.layers.0.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
52
+ "language_model.model.layers.0.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
53
+ "language_model.model.layers.0.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
54
+ "language_model.model.layers.0.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
55
+ "language_model.model.layers.0.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
56
+ "language_model.model.layers.0.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
57
+ "language_model.model.layers.0.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
58
+ "language_model.model.layers.1.input_layernorm.weight": "model-00002-of-00007.safetensors",
59
+ "language_model.model.layers.1.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
60
+ "language_model.model.layers.1.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
61
+ "language_model.model.layers.1.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
62
+ "language_model.model.layers.1.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
63
+ "language_model.model.layers.1.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
64
+ "language_model.model.layers.1.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
65
+ "language_model.model.layers.1.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
66
+ "language_model.model.layers.1.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
67
+ "language_model.model.layers.1.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
68
+ "language_model.model.layers.1.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
69
+ "language_model.model.layers.2.input_layernorm.weight": "model-00002-of-00007.safetensors",
70
+ "language_model.model.layers.2.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
71
+ "language_model.model.layers.2.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
72
+ "language_model.model.layers.2.mlp.up_proj.weight": "model-00002-of-00007.safetensors",
73
+ "language_model.model.layers.2.post_attention_layernorm.weight": "model-00002-of-00007.safetensors",
74
+ "language_model.model.layers.2.self_attn.k_norm.weight": "model-00002-of-00007.safetensors",
75
+ "language_model.model.layers.2.self_attn.k_proj.weight": "model-00002-of-00007.safetensors",
76
+ "language_model.model.layers.2.self_attn.o_proj.weight": "model-00002-of-00007.safetensors",
77
+ "language_model.model.layers.2.self_attn.q_norm.weight": "model-00002-of-00007.safetensors",
78
+ "language_model.model.layers.2.self_attn.q_proj.weight": "model-00002-of-00007.safetensors",
79
+ "language_model.model.layers.2.self_attn.v_proj.weight": "model-00002-of-00007.safetensors",
80
+ "language_model.model.layers.3.input_layernorm.weight": "model-00002-of-00007.safetensors",
81
+ "language_model.model.layers.3.mlp.down_proj.weight": "model-00002-of-00007.safetensors",
82
+ "language_model.model.layers.3.mlp.gate_proj.weight": "model-00002-of-00007.safetensors",
83
+ "language_model.model.layers.3.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
84
+ "language_model.model.layers.3.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
85
+ "language_model.model.layers.3.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
86
+ "language_model.model.layers.3.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
87
+ "language_model.model.layers.3.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
88
+ "language_model.model.layers.3.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
89
+ "language_model.model.layers.3.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
90
+ "language_model.model.layers.3.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
91
+ "language_model.model.layers.4.input_layernorm.weight": "model-00003-of-00007.safetensors",
92
+ "language_model.model.layers.4.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
93
+ "language_model.model.layers.4.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
94
+ "language_model.model.layers.4.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
95
+ "language_model.model.layers.4.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
96
+ "language_model.model.layers.4.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
97
+ "language_model.model.layers.4.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
98
+ "language_model.model.layers.4.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
99
+ "language_model.model.layers.4.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
100
+ "language_model.model.layers.4.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
101
+ "language_model.model.layers.4.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
102
+ "language_model.model.layers.5.input_layernorm.weight": "model-00003-of-00007.safetensors",
103
+ "language_model.model.layers.5.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
104
+ "language_model.model.layers.5.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
105
+ "language_model.model.layers.5.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
106
+ "language_model.model.layers.5.post_attention_layernorm.weight": "model-00003-of-00007.safetensors",
107
+ "language_model.model.layers.5.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
108
+ "language_model.model.layers.5.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
109
+ "language_model.model.layers.5.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
110
+ "language_model.model.layers.5.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
111
+ "language_model.model.layers.5.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
112
+ "language_model.model.layers.5.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
113
+ "language_model.model.layers.6.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
114
+ "language_model.model.layers.6.mlp.up_proj.weight": "model-00003-of-00007.safetensors",
115
+ "language_model.model.layers.6.self_attn.k_norm.weight": "model-00003-of-00007.safetensors",
116
+ "language_model.model.layers.6.self_attn.k_proj.weight": "model-00003-of-00007.safetensors",
117
+ "language_model.model.layers.6.self_attn.o_proj.weight": "model-00003-of-00007.safetensors",
118
+ "language_model.model.layers.6.self_attn.q_norm.weight": "model-00003-of-00007.safetensors",
119
+ "language_model.model.layers.6.self_attn.q_proj.weight": "model-00003-of-00007.safetensors",
120
+ "language_model.model.layers.6.self_attn.v_proj.weight": "model-00003-of-00007.safetensors",
121
+ "vision_tower.blocks.0.attn.proj.bias": "model-00003-of-00007.safetensors",
122
+ "vision_tower.blocks.0.attn.proj.weight": "model-00003-of-00007.safetensors",
123
+ "vision_tower.blocks.0.attn.qkv.bias": "model-00003-of-00007.safetensors",
124
+ "vision_tower.blocks.0.attn.qkv.weight": "model-00003-of-00007.safetensors",
125
+ "vision_tower.blocks.0.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
126
+ "vision_tower.blocks.0.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
127
+ "vision_tower.blocks.0.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
128
+ "vision_tower.blocks.0.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
129
+ "vision_tower.blocks.0.norm1.bias": "model-00003-of-00007.safetensors",
130
+ "vision_tower.blocks.0.norm1.weight": "model-00003-of-00007.safetensors",
131
+ "vision_tower.blocks.0.norm2.bias": "model-00003-of-00007.safetensors",
132
+ "vision_tower.blocks.0.norm2.weight": "model-00003-of-00007.safetensors",
133
+ "vision_tower.blocks.1.attn.proj.bias": "model-00003-of-00007.safetensors",
134
+ "vision_tower.blocks.1.attn.proj.weight": "model-00003-of-00007.safetensors",
135
+ "vision_tower.blocks.1.attn.qkv.bias": "model-00003-of-00007.safetensors",
136
+ "vision_tower.blocks.1.attn.qkv.weight": "model-00003-of-00007.safetensors",
137
+ "vision_tower.blocks.1.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
138
+ "vision_tower.blocks.1.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
139
+ "vision_tower.blocks.1.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
140
+ "vision_tower.blocks.1.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
141
+ "vision_tower.blocks.1.norm1.bias": "model-00003-of-00007.safetensors",
142
+ "vision_tower.blocks.1.norm1.weight": "model-00003-of-00007.safetensors",
143
+ "vision_tower.blocks.1.norm2.bias": "model-00003-of-00007.safetensors",
144
+ "vision_tower.blocks.1.norm2.weight": "model-00003-of-00007.safetensors",
145
+ "vision_tower.blocks.10.attn.proj.bias": "model-00003-of-00007.safetensors",
146
+ "vision_tower.blocks.10.attn.proj.weight": "model-00003-of-00007.safetensors",
147
+ "vision_tower.blocks.10.attn.qkv.bias": "model-00003-of-00007.safetensors",
148
+ "vision_tower.blocks.10.attn.qkv.weight": "model-00003-of-00007.safetensors",
149
+ "vision_tower.blocks.10.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
150
+ "vision_tower.blocks.10.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
151
+ "vision_tower.blocks.10.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
152
+ "vision_tower.blocks.10.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
153
+ "vision_tower.blocks.10.norm1.bias": "model-00003-of-00007.safetensors",
154
+ "vision_tower.blocks.10.norm1.weight": "model-00003-of-00007.safetensors",
155
+ "vision_tower.blocks.10.norm2.bias": "model-00003-of-00007.safetensors",
156
+ "vision_tower.blocks.10.norm2.weight": "model-00003-of-00007.safetensors",
157
+ "vision_tower.blocks.11.attn.proj.bias": "model-00003-of-00007.safetensors",
158
+ "vision_tower.blocks.11.attn.proj.weight": "model-00003-of-00007.safetensors",
159
+ "vision_tower.blocks.11.attn.qkv.bias": "model-00003-of-00007.safetensors",
160
+ "vision_tower.blocks.11.attn.qkv.weight": "model-00003-of-00007.safetensors",
161
+ "vision_tower.blocks.11.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
162
+ "vision_tower.blocks.11.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
163
+ "vision_tower.blocks.11.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
164
+ "vision_tower.blocks.11.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
165
+ "vision_tower.blocks.11.norm1.bias": "model-00003-of-00007.safetensors",
166
+ "vision_tower.blocks.11.norm1.weight": "model-00003-of-00007.safetensors",
167
+ "vision_tower.blocks.11.norm2.bias": "model-00003-of-00007.safetensors",
168
+ "vision_tower.blocks.11.norm2.weight": "model-00003-of-00007.safetensors",
169
+ "vision_tower.blocks.12.attn.proj.bias": "model-00003-of-00007.safetensors",
170
+ "vision_tower.blocks.12.attn.proj.weight": "model-00003-of-00007.safetensors",
171
+ "vision_tower.blocks.12.attn.qkv.bias": "model-00003-of-00007.safetensors",
172
+ "vision_tower.blocks.12.attn.qkv.weight": "model-00003-of-00007.safetensors",
173
+ "vision_tower.blocks.12.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
174
+ "vision_tower.blocks.12.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
175
+ "vision_tower.blocks.12.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
176
+ "vision_tower.blocks.12.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
177
+ "vision_tower.blocks.12.norm1.bias": "model-00003-of-00007.safetensors",
178
+ "vision_tower.blocks.12.norm1.weight": "model-00003-of-00007.safetensors",
179
+ "vision_tower.blocks.12.norm2.bias": "model-00003-of-00007.safetensors",
180
+ "vision_tower.blocks.12.norm2.weight": "model-00003-of-00007.safetensors",
181
+ "vision_tower.blocks.13.attn.proj.bias": "model-00003-of-00007.safetensors",
182
+ "vision_tower.blocks.13.attn.proj.weight": "model-00003-of-00007.safetensors",
183
+ "vision_tower.blocks.13.attn.qkv.bias": "model-00003-of-00007.safetensors",
184
+ "vision_tower.blocks.13.attn.qkv.weight": "model-00003-of-00007.safetensors",
185
+ "vision_tower.blocks.13.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
186
+ "vision_tower.blocks.13.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
187
+ "vision_tower.blocks.13.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
188
+ "vision_tower.blocks.13.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
189
+ "vision_tower.blocks.13.norm1.bias": "model-00003-of-00007.safetensors",
190
+ "vision_tower.blocks.13.norm1.weight": "model-00003-of-00007.safetensors",
191
+ "vision_tower.blocks.13.norm2.bias": "model-00003-of-00007.safetensors",
192
+ "vision_tower.blocks.13.norm2.weight": "model-00003-of-00007.safetensors",
193
+ "vision_tower.blocks.14.attn.proj.bias": "model-00003-of-00007.safetensors",
194
+ "vision_tower.blocks.14.attn.proj.weight": "model-00003-of-00007.safetensors",
195
+ "vision_tower.blocks.14.attn.qkv.bias": "model-00003-of-00007.safetensors",
196
+ "vision_tower.blocks.14.attn.qkv.weight": "model-00003-of-00007.safetensors",
197
+ "vision_tower.blocks.14.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
198
+ "vision_tower.blocks.14.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
199
+ "vision_tower.blocks.14.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
200
+ "vision_tower.blocks.14.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
201
+ "vision_tower.blocks.14.norm1.bias": "model-00003-of-00007.safetensors",
202
+ "vision_tower.blocks.14.norm1.weight": "model-00003-of-00007.safetensors",
203
+ "vision_tower.blocks.14.norm2.bias": "model-00003-of-00007.safetensors",
204
+ "vision_tower.blocks.14.norm2.weight": "model-00003-of-00007.safetensors",
205
+ "vision_tower.blocks.15.attn.proj.bias": "model-00003-of-00007.safetensors",
206
+ "vision_tower.blocks.15.attn.proj.weight": "model-00003-of-00007.safetensors",
207
+ "vision_tower.blocks.15.attn.qkv.bias": "model-00003-of-00007.safetensors",
208
+ "vision_tower.blocks.15.attn.qkv.weight": "model-00003-of-00007.safetensors",
209
+ "vision_tower.blocks.15.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
210
+ "vision_tower.blocks.15.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
211
+ "vision_tower.blocks.15.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
212
+ "vision_tower.blocks.15.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
213
+ "vision_tower.blocks.15.norm1.bias": "model-00003-of-00007.safetensors",
214
+ "vision_tower.blocks.15.norm1.weight": "model-00003-of-00007.safetensors",
215
+ "vision_tower.blocks.15.norm2.bias": "model-00003-of-00007.safetensors",
216
+ "vision_tower.blocks.15.norm2.weight": "model-00003-of-00007.safetensors",
217
+ "vision_tower.blocks.16.attn.proj.bias": "model-00003-of-00007.safetensors",
218
+ "vision_tower.blocks.16.attn.proj.weight": "model-00003-of-00007.safetensors",
219
+ "vision_tower.blocks.16.attn.qkv.bias": "model-00003-of-00007.safetensors",
220
+ "vision_tower.blocks.16.attn.qkv.weight": "model-00003-of-00007.safetensors",
221
+ "vision_tower.blocks.16.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
222
+ "vision_tower.blocks.16.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
223
+ "vision_tower.blocks.16.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
224
+ "vision_tower.blocks.16.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
225
+ "vision_tower.blocks.16.norm1.bias": "model-00003-of-00007.safetensors",
226
+ "vision_tower.blocks.16.norm1.weight": "model-00003-of-00007.safetensors",
227
+ "vision_tower.blocks.16.norm2.bias": "model-00003-of-00007.safetensors",
228
+ "vision_tower.blocks.16.norm2.weight": "model-00003-of-00007.safetensors",
229
+ "vision_tower.blocks.17.attn.proj.bias": "model-00003-of-00007.safetensors",
230
+ "vision_tower.blocks.17.attn.proj.weight": "model-00003-of-00007.safetensors",
231
+ "vision_tower.blocks.17.attn.qkv.bias": "model-00003-of-00007.safetensors",
232
+ "vision_tower.blocks.17.attn.qkv.weight": "model-00003-of-00007.safetensors",
233
+ "vision_tower.blocks.17.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
234
+ "vision_tower.blocks.17.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
235
+ "vision_tower.blocks.17.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
236
+ "vision_tower.blocks.17.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
237
+ "vision_tower.blocks.17.norm1.bias": "model-00003-of-00007.safetensors",
238
+ "vision_tower.blocks.17.norm1.weight": "model-00003-of-00007.safetensors",
239
+ "vision_tower.blocks.17.norm2.bias": "model-00003-of-00007.safetensors",
240
+ "vision_tower.blocks.17.norm2.weight": "model-00003-of-00007.safetensors",
241
+ "vision_tower.blocks.18.attn.proj.bias": "model-00003-of-00007.safetensors",
242
+ "vision_tower.blocks.18.attn.proj.weight": "model-00003-of-00007.safetensors",
243
+ "vision_tower.blocks.18.attn.qkv.bias": "model-00003-of-00007.safetensors",
244
+ "vision_tower.blocks.18.attn.qkv.weight": "model-00003-of-00007.safetensors",
245
+ "vision_tower.blocks.18.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
246
+ "vision_tower.blocks.18.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
247
+ "vision_tower.blocks.18.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
248
+ "vision_tower.blocks.18.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
249
+ "vision_tower.blocks.18.norm1.bias": "model-00003-of-00007.safetensors",
250
+ "vision_tower.blocks.18.norm1.weight": "model-00003-of-00007.safetensors",
251
+ "vision_tower.blocks.18.norm2.bias": "model-00003-of-00007.safetensors",
252
+ "vision_tower.blocks.18.norm2.weight": "model-00003-of-00007.safetensors",
253
+ "vision_tower.blocks.19.attn.proj.bias": "model-00003-of-00007.safetensors",
254
+ "vision_tower.blocks.19.attn.proj.weight": "model-00003-of-00007.safetensors",
255
+ "vision_tower.blocks.19.attn.qkv.bias": "model-00003-of-00007.safetensors",
256
+ "vision_tower.blocks.19.attn.qkv.weight": "model-00003-of-00007.safetensors",
257
+ "vision_tower.blocks.19.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
258
+ "vision_tower.blocks.19.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
259
+ "vision_tower.blocks.19.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
260
+ "vision_tower.blocks.19.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
261
+ "vision_tower.blocks.19.norm1.bias": "model-00003-of-00007.safetensors",
262
+ "vision_tower.blocks.19.norm1.weight": "model-00003-of-00007.safetensors",
263
+ "vision_tower.blocks.19.norm2.bias": "model-00003-of-00007.safetensors",
264
+ "vision_tower.blocks.19.norm2.weight": "model-00003-of-00007.safetensors",
265
+ "vision_tower.blocks.2.attn.proj.bias": "model-00003-of-00007.safetensors",
266
+ "vision_tower.blocks.2.attn.proj.weight": "model-00003-of-00007.safetensors",
267
+ "vision_tower.blocks.2.attn.qkv.bias": "model-00003-of-00007.safetensors",
268
+ "vision_tower.blocks.2.attn.qkv.weight": "model-00003-of-00007.safetensors",
269
+ "vision_tower.blocks.2.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
270
+ "vision_tower.blocks.2.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
271
+ "vision_tower.blocks.2.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
272
+ "vision_tower.blocks.2.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
273
+ "vision_tower.blocks.2.norm1.bias": "model-00003-of-00007.safetensors",
274
+ "vision_tower.blocks.2.norm1.weight": "model-00003-of-00007.safetensors",
275
+ "vision_tower.blocks.2.norm2.bias": "model-00003-of-00007.safetensors",
276
+ "vision_tower.blocks.2.norm2.weight": "model-00003-of-00007.safetensors",
277
+ "vision_tower.blocks.20.attn.proj.bias": "model-00003-of-00007.safetensors",
278
+ "vision_tower.blocks.20.attn.proj.weight": "model-00003-of-00007.safetensors",
279
+ "vision_tower.blocks.20.attn.qkv.bias": "model-00003-of-00007.safetensors",
280
+ "vision_tower.blocks.20.attn.qkv.weight": "model-00003-of-00007.safetensors",
281
+ "vision_tower.blocks.20.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
282
+ "vision_tower.blocks.20.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
283
+ "vision_tower.blocks.20.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
284
+ "vision_tower.blocks.20.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
285
+ "vision_tower.blocks.20.norm1.bias": "model-00003-of-00007.safetensors",
286
+ "vision_tower.blocks.20.norm1.weight": "model-00003-of-00007.safetensors",
287
+ "vision_tower.blocks.20.norm2.bias": "model-00003-of-00007.safetensors",
288
+ "vision_tower.blocks.20.norm2.weight": "model-00003-of-00007.safetensors",
289
+ "vision_tower.blocks.21.attn.proj.bias": "model-00003-of-00007.safetensors",
290
+ "vision_tower.blocks.21.attn.proj.weight": "model-00003-of-00007.safetensors",
291
+ "vision_tower.blocks.21.attn.qkv.bias": "model-00003-of-00007.safetensors",
292
+ "vision_tower.blocks.21.attn.qkv.weight": "model-00003-of-00007.safetensors",
293
+ "vision_tower.blocks.21.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
294
+ "vision_tower.blocks.21.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
295
+ "vision_tower.blocks.21.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
296
+ "vision_tower.blocks.21.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
297
+ "vision_tower.blocks.21.norm1.bias": "model-00003-of-00007.safetensors",
298
+ "vision_tower.blocks.21.norm1.weight": "model-00003-of-00007.safetensors",
299
+ "vision_tower.blocks.21.norm2.bias": "model-00003-of-00007.safetensors",
300
+ "vision_tower.blocks.21.norm2.weight": "model-00003-of-00007.safetensors",
301
+ "vision_tower.blocks.22.attn.proj.bias": "model-00003-of-00007.safetensors",
302
+ "vision_tower.blocks.22.attn.proj.weight": "model-00003-of-00007.safetensors",
303
+ "vision_tower.blocks.22.attn.qkv.bias": "model-00003-of-00007.safetensors",
304
+ "vision_tower.blocks.22.attn.qkv.weight": "model-00003-of-00007.safetensors",
305
+ "vision_tower.blocks.22.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
306
+ "vision_tower.blocks.22.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
307
+ "vision_tower.blocks.22.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
308
+ "vision_tower.blocks.22.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
309
+ "vision_tower.blocks.22.norm1.bias": "model-00003-of-00007.safetensors",
310
+ "vision_tower.blocks.22.norm1.weight": "model-00003-of-00007.safetensors",
311
+ "vision_tower.blocks.22.norm2.bias": "model-00003-of-00007.safetensors",
312
+ "vision_tower.blocks.22.norm2.weight": "model-00003-of-00007.safetensors",
313
+ "vision_tower.blocks.23.attn.proj.bias": "model-00003-of-00007.safetensors",
314
+ "vision_tower.blocks.23.attn.proj.weight": "model-00003-of-00007.safetensors",
315
+ "vision_tower.blocks.23.attn.qkv.bias": "model-00003-of-00007.safetensors",
316
+ "vision_tower.blocks.23.attn.qkv.weight": "model-00003-of-00007.safetensors",
317
+ "vision_tower.blocks.23.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
318
+ "vision_tower.blocks.23.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
319
+ "vision_tower.blocks.23.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
320
+ "vision_tower.blocks.23.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
321
+ "vision_tower.blocks.23.norm1.bias": "model-00003-of-00007.safetensors",
322
+ "vision_tower.blocks.23.norm1.weight": "model-00003-of-00007.safetensors",
323
+ "vision_tower.blocks.23.norm2.bias": "model-00003-of-00007.safetensors",
324
+ "vision_tower.blocks.23.norm2.weight": "model-00003-of-00007.safetensors",
325
+ "vision_tower.blocks.24.attn.proj.bias": "model-00003-of-00007.safetensors",
326
+ "vision_tower.blocks.24.attn.proj.weight": "model-00003-of-00007.safetensors",
327
+ "vision_tower.blocks.24.attn.qkv.bias": "model-00003-of-00007.safetensors",
328
+ "vision_tower.blocks.24.attn.qkv.weight": "model-00003-of-00007.safetensors",
329
+ "vision_tower.blocks.24.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
330
+ "vision_tower.blocks.24.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
331
+ "vision_tower.blocks.24.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
332
+ "vision_tower.blocks.24.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
333
+ "vision_tower.blocks.24.norm1.bias": "model-00003-of-00007.safetensors",
334
+ "vision_tower.blocks.24.norm1.weight": "model-00003-of-00007.safetensors",
335
+ "vision_tower.blocks.24.norm2.bias": "model-00003-of-00007.safetensors",
336
+ "vision_tower.blocks.24.norm2.weight": "model-00003-of-00007.safetensors",
337
+ "vision_tower.blocks.25.attn.proj.bias": "model-00003-of-00007.safetensors",
338
+ "vision_tower.blocks.25.attn.proj.weight": "model-00003-of-00007.safetensors",
339
+ "vision_tower.blocks.25.attn.qkv.bias": "model-00003-of-00007.safetensors",
340
+ "vision_tower.blocks.25.attn.qkv.weight": "model-00003-of-00007.safetensors",
341
+ "vision_tower.blocks.25.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
342
+ "vision_tower.blocks.25.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
343
+ "vision_tower.blocks.25.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
344
+ "vision_tower.blocks.25.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
345
+ "vision_tower.blocks.25.norm1.bias": "model-00003-of-00007.safetensors",
346
+ "vision_tower.blocks.25.norm1.weight": "model-00003-of-00007.safetensors",
347
+ "vision_tower.blocks.25.norm2.bias": "model-00003-of-00007.safetensors",
348
+ "vision_tower.blocks.25.norm2.weight": "model-00003-of-00007.safetensors",
349
+ "vision_tower.blocks.26.attn.proj.bias": "model-00003-of-00007.safetensors",
350
+ "vision_tower.blocks.26.attn.proj.weight": "model-00003-of-00007.safetensors",
351
+ "vision_tower.blocks.26.attn.qkv.bias": "model-00003-of-00007.safetensors",
352
+ "vision_tower.blocks.26.attn.qkv.weight": "model-00003-of-00007.safetensors",
353
+ "vision_tower.blocks.26.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
354
+ "vision_tower.blocks.26.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
355
+ "vision_tower.blocks.26.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
356
+ "vision_tower.blocks.26.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
357
+ "vision_tower.blocks.26.norm1.bias": "model-00003-of-00007.safetensors",
358
+ "vision_tower.blocks.26.norm1.weight": "model-00003-of-00007.safetensors",
359
+ "vision_tower.blocks.26.norm2.bias": "model-00003-of-00007.safetensors",
360
+ "vision_tower.blocks.26.norm2.weight": "model-00003-of-00007.safetensors",
361
+ "vision_tower.blocks.3.attn.proj.bias": "model-00003-of-00007.safetensors",
362
+ "vision_tower.blocks.3.attn.proj.weight": "model-00003-of-00007.safetensors",
363
+ "vision_tower.blocks.3.attn.qkv.bias": "model-00003-of-00007.safetensors",
364
+ "vision_tower.blocks.3.attn.qkv.weight": "model-00003-of-00007.safetensors",
365
+ "vision_tower.blocks.3.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
366
+ "vision_tower.blocks.3.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
367
+ "vision_tower.blocks.3.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
368
+ "vision_tower.blocks.3.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
369
+ "vision_tower.blocks.3.norm1.bias": "model-00003-of-00007.safetensors",
370
+ "vision_tower.blocks.3.norm1.weight": "model-00003-of-00007.safetensors",
371
+ "vision_tower.blocks.3.norm2.bias": "model-00003-of-00007.safetensors",
372
+ "vision_tower.blocks.3.norm2.weight": "model-00003-of-00007.safetensors",
373
+ "vision_tower.blocks.4.attn.proj.bias": "model-00003-of-00007.safetensors",
374
+ "vision_tower.blocks.4.attn.proj.weight": "model-00003-of-00007.safetensors",
375
+ "vision_tower.blocks.4.attn.qkv.bias": "model-00003-of-00007.safetensors",
376
+ "vision_tower.blocks.4.attn.qkv.weight": "model-00003-of-00007.safetensors",
377
+ "vision_tower.blocks.4.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
378
+ "vision_tower.blocks.4.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
379
+ "vision_tower.blocks.4.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
380
+ "vision_tower.blocks.4.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
381
+ "vision_tower.blocks.4.norm1.bias": "model-00003-of-00007.safetensors",
382
+ "vision_tower.blocks.4.norm1.weight": "model-00003-of-00007.safetensors",
383
+ "vision_tower.blocks.4.norm2.bias": "model-00003-of-00007.safetensors",
384
+ "vision_tower.blocks.4.norm2.weight": "model-00003-of-00007.safetensors",
385
+ "vision_tower.blocks.5.attn.proj.bias": "model-00003-of-00007.safetensors",
386
+ "vision_tower.blocks.5.attn.proj.weight": "model-00003-of-00007.safetensors",
387
+ "vision_tower.blocks.5.attn.qkv.bias": "model-00003-of-00007.safetensors",
388
+ "vision_tower.blocks.5.attn.qkv.weight": "model-00003-of-00007.safetensors",
389
+ "vision_tower.blocks.5.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
390
+ "vision_tower.blocks.5.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
391
+ "vision_tower.blocks.5.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
392
+ "vision_tower.blocks.5.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
393
+ "vision_tower.blocks.5.norm1.bias": "model-00003-of-00007.safetensors",
394
+ "vision_tower.blocks.5.norm1.weight": "model-00003-of-00007.safetensors",
395
+ "vision_tower.blocks.5.norm2.bias": "model-00003-of-00007.safetensors",
396
+ "vision_tower.blocks.5.norm2.weight": "model-00003-of-00007.safetensors",
397
+ "vision_tower.blocks.6.attn.proj.bias": "model-00003-of-00007.safetensors",
398
+ "vision_tower.blocks.6.attn.proj.weight": "model-00003-of-00007.safetensors",
399
+ "vision_tower.blocks.6.attn.qkv.bias": "model-00003-of-00007.safetensors",
400
+ "vision_tower.blocks.6.attn.qkv.weight": "model-00003-of-00007.safetensors",
401
+ "vision_tower.blocks.6.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
402
+ "vision_tower.blocks.6.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
403
+ "vision_tower.blocks.6.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
404
+ "vision_tower.blocks.6.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
405
+ "vision_tower.blocks.6.norm1.bias": "model-00003-of-00007.safetensors",
406
+ "vision_tower.blocks.6.norm1.weight": "model-00003-of-00007.safetensors",
407
+ "vision_tower.blocks.6.norm2.bias": "model-00003-of-00007.safetensors",
408
+ "vision_tower.blocks.6.norm2.weight": "model-00003-of-00007.safetensors",
409
+ "vision_tower.blocks.7.attn.proj.bias": "model-00003-of-00007.safetensors",
410
+ "vision_tower.blocks.7.attn.proj.weight": "model-00003-of-00007.safetensors",
411
+ "vision_tower.blocks.7.attn.qkv.bias": "model-00003-of-00007.safetensors",
412
+ "vision_tower.blocks.7.attn.qkv.weight": "model-00003-of-00007.safetensors",
413
+ "vision_tower.blocks.7.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
414
+ "vision_tower.blocks.7.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
415
+ "vision_tower.blocks.7.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
416
+ "vision_tower.blocks.7.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
417
+ "vision_tower.blocks.7.norm1.bias": "model-00003-of-00007.safetensors",
418
+ "vision_tower.blocks.7.norm1.weight": "model-00003-of-00007.safetensors",
419
+ "vision_tower.blocks.7.norm2.bias": "model-00003-of-00007.safetensors",
420
+ "vision_tower.blocks.7.norm2.weight": "model-00003-of-00007.safetensors",
421
+ "vision_tower.blocks.8.attn.proj.bias": "model-00003-of-00007.safetensors",
422
+ "vision_tower.blocks.8.attn.proj.weight": "model-00003-of-00007.safetensors",
423
+ "vision_tower.blocks.8.attn.qkv.bias": "model-00003-of-00007.safetensors",
424
+ "vision_tower.blocks.8.attn.qkv.weight": "model-00003-of-00007.safetensors",
425
+ "vision_tower.blocks.8.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
426
+ "vision_tower.blocks.8.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
427
+ "vision_tower.blocks.8.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
428
+ "vision_tower.blocks.8.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
429
+ "vision_tower.blocks.8.norm1.bias": "model-00003-of-00007.safetensors",
430
+ "vision_tower.blocks.8.norm1.weight": "model-00003-of-00007.safetensors",
431
+ "vision_tower.blocks.8.norm2.bias": "model-00003-of-00007.safetensors",
432
+ "vision_tower.blocks.8.norm2.weight": "model-00003-of-00007.safetensors",
433
+ "vision_tower.blocks.9.attn.proj.bias": "model-00003-of-00007.safetensors",
434
+ "vision_tower.blocks.9.attn.proj.weight": "model-00003-of-00007.safetensors",
435
+ "vision_tower.blocks.9.attn.qkv.bias": "model-00003-of-00007.safetensors",
436
+ "vision_tower.blocks.9.attn.qkv.weight": "model-00003-of-00007.safetensors",
437
+ "vision_tower.blocks.9.mlp.linear_fc1.bias": "model-00003-of-00007.safetensors",
438
+ "vision_tower.blocks.9.mlp.linear_fc1.weight": "model-00003-of-00007.safetensors",
439
+ "vision_tower.blocks.9.mlp.linear_fc2.bias": "model-00003-of-00007.safetensors",
440
+ "vision_tower.blocks.9.mlp.linear_fc2.weight": "model-00003-of-00007.safetensors",
441
+ "vision_tower.blocks.9.norm1.bias": "model-00003-of-00007.safetensors",
442
+ "vision_tower.blocks.9.norm1.weight": "model-00003-of-00007.safetensors",
443
+ "vision_tower.blocks.9.norm2.bias": "model-00003-of-00007.safetensors",
444
+ "vision_tower.blocks.9.norm2.weight": "model-00003-of-00007.safetensors",
445
+ "vision_tower.deepstack_merger_list.0.linear_fc1.bias": "model-00003-of-00007.safetensors",
446
+ "vision_tower.deepstack_merger_list.0.linear_fc1.weight": "model-00003-of-00007.safetensors",
447
+ "vision_tower.deepstack_merger_list.0.linear_fc2.bias": "model-00003-of-00007.safetensors",
448
+ "vision_tower.deepstack_merger_list.0.linear_fc2.weight": "model-00003-of-00007.safetensors",
449
+ "vision_tower.deepstack_merger_list.0.norm.bias": "model-00003-of-00007.safetensors",
450
+ "vision_tower.deepstack_merger_list.0.norm.weight": "model-00003-of-00007.safetensors",
451
+ "vision_tower.deepstack_merger_list.1.linear_fc1.bias": "model-00003-of-00007.safetensors",
452
+ "vision_tower.deepstack_merger_list.1.linear_fc1.weight": "model-00003-of-00007.safetensors",
453
+ "vision_tower.deepstack_merger_list.1.linear_fc2.bias": "model-00003-of-00007.safetensors",
454
+ "vision_tower.deepstack_merger_list.1.linear_fc2.weight": "model-00003-of-00007.safetensors",
455
+ "vision_tower.deepstack_merger_list.1.norm.bias": "model-00003-of-00007.safetensors",
456
+ "vision_tower.deepstack_merger_list.1.norm.weight": "model-00003-of-00007.safetensors",
457
+ "vision_tower.deepstack_merger_list.2.linear_fc1.bias": "model-00003-of-00007.safetensors",
458
+ "vision_tower.deepstack_merger_list.2.linear_fc1.weight": "model-00003-of-00007.safetensors",
459
+ "vision_tower.deepstack_merger_list.2.linear_fc2.bias": "model-00003-of-00007.safetensors",
460
+ "vision_tower.deepstack_merger_list.2.linear_fc2.weight": "model-00003-of-00007.safetensors",
461
+ "vision_tower.deepstack_merger_list.2.norm.bias": "model-00003-of-00007.safetensors",
462
+ "vision_tower.deepstack_merger_list.2.norm.weight": "model-00003-of-00007.safetensors",
463
+ "vision_tower.merger.linear_fc1.bias": "model-00003-of-00007.safetensors",
464
+ "vision_tower.merger.linear_fc1.weight": "model-00003-of-00007.safetensors",
465
+ "vision_tower.merger.linear_fc2.bias": "model-00003-of-00007.safetensors",
466
+ "vision_tower.merger.linear_fc2.weight": "model-00003-of-00007.safetensors",
467
+ "vision_tower.merger.norm.bias": "model-00003-of-00007.safetensors",
468
+ "vision_tower.merger.norm.weight": "model-00003-of-00007.safetensors",
469
+ "vision_tower.patch_embed.proj.bias": "model-00003-of-00007.safetensors",
470
+ "vision_tower.patch_embed.proj.weight": "model-00003-of-00007.safetensors",
471
+ "vision_tower.pos_embed.weight": "model-00003-of-00007.safetensors",
472
+ "language_model.model.layers.10.input_layernorm.weight": "model-00003-of-00007.safetensors",
473
+ "language_model.model.layers.10.mlp.down_proj.weight": "model-00003-of-00007.safetensors",
474
+ "language_model.model.layers.10.mlp.gate_proj.weight": "model-00003-of-00007.safetensors",
475
+ "language_model.model.layers.10.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
476
+ "language_model.model.layers.10.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
477
+ "language_model.model.layers.10.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
478
+ "language_model.model.layers.10.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
479
+ "language_model.model.layers.10.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
480
+ "language_model.model.layers.10.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
481
+ "language_model.model.layers.10.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
482
+ "language_model.model.layers.10.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
483
+ "language_model.model.layers.11.input_layernorm.weight": "model-00004-of-00007.safetensors",
484
+ "language_model.model.layers.11.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
485
+ "language_model.model.layers.11.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
486
+ "language_model.model.layers.11.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
487
+ "language_model.model.layers.11.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
488
+ "language_model.model.layers.11.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
489
+ "language_model.model.layers.11.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
490
+ "language_model.model.layers.11.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
491
+ "language_model.model.layers.11.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
492
+ "language_model.model.layers.11.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
493
+ "language_model.model.layers.11.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
494
+ "language_model.model.layers.12.input_layernorm.weight": "model-00004-of-00007.safetensors",
495
+ "language_model.model.layers.12.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
496
+ "language_model.model.layers.12.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
497
+ "language_model.model.layers.12.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
498
+ "language_model.model.layers.12.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
499
+ "language_model.model.layers.12.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
500
+ "language_model.model.layers.12.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
501
+ "language_model.model.layers.12.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
502
+ "language_model.model.layers.12.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
503
+ "language_model.model.layers.12.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
504
+ "language_model.model.layers.12.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
505
+ "language_model.model.layers.13.input_layernorm.weight": "model-00004-of-00007.safetensors",
506
+ "language_model.model.layers.13.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
507
+ "language_model.model.layers.13.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
508
+ "language_model.model.layers.13.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
509
+ "language_model.model.layers.13.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
510
+ "language_model.model.layers.13.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
511
+ "language_model.model.layers.13.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
512
+ "language_model.model.layers.13.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
513
+ "language_model.model.layers.13.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
514
+ "language_model.model.layers.13.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
515
+ "language_model.model.layers.13.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
516
+ "language_model.model.layers.14.input_layernorm.weight": "model-00004-of-00007.safetensors",
517
+ "language_model.model.layers.14.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
518
+ "language_model.model.layers.14.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
519
+ "language_model.model.layers.14.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
520
+ "language_model.model.layers.14.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
521
+ "language_model.model.layers.14.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
522
+ "language_model.model.layers.14.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
523
+ "language_model.model.layers.14.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
524
+ "language_model.model.layers.14.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
525
+ "language_model.model.layers.14.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
526
+ "language_model.model.layers.14.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
527
+ "language_model.model.layers.15.input_layernorm.weight": "model-00004-of-00007.safetensors",
528
+ "language_model.model.layers.15.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
529
+ "language_model.model.layers.15.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
530
+ "language_model.model.layers.15.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
531
+ "language_model.model.layers.15.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
532
+ "language_model.model.layers.15.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
533
+ "language_model.model.layers.15.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
534
+ "language_model.model.layers.15.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
535
+ "language_model.model.layers.15.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
536
+ "language_model.model.layers.15.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
537
+ "language_model.model.layers.15.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
538
+ "language_model.model.layers.16.input_layernorm.weight": "model-00004-of-00007.safetensors",
539
+ "language_model.model.layers.16.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
540
+ "language_model.model.layers.16.mlp.gate_proj.weight": "model-00004-of-00007.safetensors",
541
+ "language_model.model.layers.16.mlp.up_proj.weight": "model-00004-of-00007.safetensors",
542
+ "language_model.model.layers.16.post_attention_layernorm.weight": "model-00004-of-00007.safetensors",
543
+ "language_model.model.layers.16.self_attn.k_norm.weight": "model-00004-of-00007.safetensors",
544
+ "language_model.model.layers.16.self_attn.k_proj.weight": "model-00004-of-00007.safetensors",
545
+ "language_model.model.layers.16.self_attn.o_proj.weight": "model-00004-of-00007.safetensors",
546
+ "language_model.model.layers.16.self_attn.q_norm.weight": "model-00004-of-00007.safetensors",
547
+ "language_model.model.layers.16.self_attn.q_proj.weight": "model-00004-of-00007.safetensors",
548
+ "language_model.model.layers.16.self_attn.v_proj.weight": "model-00004-of-00007.safetensors",
549
+ "language_model.model.layers.17.input_layernorm.weight": "model-00004-of-00007.safetensors",
550
+ "language_model.model.layers.17.mlp.down_proj.weight": "model-00004-of-00007.safetensors",
551
+ "language_model.model.layers.17.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
552
+ "language_model.model.layers.17.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
553
+ "language_model.model.layers.17.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
554
+ "language_model.model.layers.17.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
555
+ "language_model.model.layers.17.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
556
+ "language_model.model.layers.17.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
557
+ "language_model.model.layers.17.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
558
+ "language_model.model.layers.17.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
559
+ "language_model.model.layers.17.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
560
+ "language_model.model.layers.18.input_layernorm.weight": "model-00005-of-00007.safetensors",
561
+ "language_model.model.layers.18.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
562
+ "language_model.model.layers.18.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
563
+ "language_model.model.layers.18.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
564
+ "language_model.model.layers.18.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
565
+ "language_model.model.layers.18.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
566
+ "language_model.model.layers.18.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
567
+ "language_model.model.layers.18.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
568
+ "language_model.model.layers.18.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
569
+ "language_model.model.layers.18.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
570
+ "language_model.model.layers.18.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
571
+ "language_model.model.layers.19.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
572
+ "language_model.model.layers.19.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
573
+ "language_model.model.layers.19.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
574
+ "language_model.model.layers.19.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
575
+ "language_model.model.layers.19.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
576
+ "language_model.model.layers.19.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
577
+ "language_model.model.layers.19.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
578
+ "language_model.model.layers.6.input_layernorm.weight": "model-00005-of-00007.safetensors",
579
+ "language_model.model.layers.6.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
580
+ "language_model.model.layers.6.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
581
+ "language_model.model.layers.7.input_layernorm.weight": "model-00005-of-00007.safetensors",
582
+ "language_model.model.layers.7.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
583
+ "language_model.model.layers.7.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
584
+ "language_model.model.layers.7.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
585
+ "language_model.model.layers.7.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
586
+ "language_model.model.layers.7.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
587
+ "language_model.model.layers.7.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
588
+ "language_model.model.layers.7.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
589
+ "language_model.model.layers.7.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
590
+ "language_model.model.layers.7.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
591
+ "language_model.model.layers.7.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
592
+ "language_model.model.layers.8.input_layernorm.weight": "model-00005-of-00007.safetensors",
593
+ "language_model.model.layers.8.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
594
+ "language_model.model.layers.8.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
595
+ "language_model.model.layers.8.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
596
+ "language_model.model.layers.8.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
597
+ "language_model.model.layers.8.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
598
+ "language_model.model.layers.8.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
599
+ "language_model.model.layers.8.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
600
+ "language_model.model.layers.8.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
601
+ "language_model.model.layers.8.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
602
+ "language_model.model.layers.8.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
603
+ "language_model.model.layers.9.input_layernorm.weight": "model-00005-of-00007.safetensors",
604
+ "language_model.model.layers.9.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
605
+ "language_model.model.layers.9.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
606
+ "language_model.model.layers.9.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
607
+ "language_model.model.layers.9.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
608
+ "language_model.model.layers.9.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
609
+ "language_model.model.layers.9.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
610
+ "language_model.model.layers.9.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
611
+ "language_model.model.layers.9.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
612
+ "language_model.model.layers.9.self_attn.q_proj.weight": "model-00005-of-00007.safetensors",
613
+ "language_model.model.layers.9.self_attn.v_proj.weight": "model-00005-of-00007.safetensors",
614
+ "language_model.model.layers.19.input_layernorm.weight": "model-00005-of-00007.safetensors",
615
+ "language_model.model.layers.19.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
616
+ "language_model.model.layers.19.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
617
+ "language_model.model.layers.19.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
618
+ "language_model.model.layers.20.input_layernorm.weight": "model-00005-of-00007.safetensors",
619
+ "language_model.model.layers.20.mlp.down_proj.weight": "model-00005-of-00007.safetensors",
620
+ "language_model.model.layers.20.mlp.gate_proj.weight": "model-00005-of-00007.safetensors",
621
+ "language_model.model.layers.20.mlp.up_proj.weight": "model-00005-of-00007.safetensors",
622
+ "language_model.model.layers.20.post_attention_layernorm.weight": "model-00005-of-00007.safetensors",
623
+ "language_model.model.layers.20.self_attn.k_norm.weight": "model-00005-of-00007.safetensors",
624
+ "language_model.model.layers.20.self_attn.k_proj.weight": "model-00005-of-00007.safetensors",
625
+ "language_model.model.layers.20.self_attn.o_proj.weight": "model-00005-of-00007.safetensors",
626
+ "language_model.model.layers.20.self_attn.q_norm.weight": "model-00005-of-00007.safetensors",
627
+ "language_model.model.layers.20.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
628
+ "language_model.model.layers.20.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
629
+ "language_model.model.layers.21.input_layernorm.weight": "model-00006-of-00007.safetensors",
630
+ "language_model.model.layers.21.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
631
+ "language_model.model.layers.21.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
632
+ "language_model.model.layers.21.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
633
+ "language_model.model.layers.21.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
634
+ "language_model.model.layers.21.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
635
+ "language_model.model.layers.21.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
636
+ "language_model.model.layers.21.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
637
+ "language_model.model.layers.21.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
638
+ "language_model.model.layers.21.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
639
+ "language_model.model.layers.21.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
640
+ "language_model.model.layers.22.input_layernorm.weight": "model-00006-of-00007.safetensors",
641
+ "language_model.model.layers.22.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
642
+ "language_model.model.layers.22.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
643
+ "language_model.model.layers.22.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
644
+ "language_model.model.layers.22.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
645
+ "language_model.model.layers.22.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
646
+ "language_model.model.layers.22.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
647
+ "language_model.model.layers.22.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
648
+ "language_model.model.layers.22.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
649
+ "language_model.model.layers.22.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
650
+ "language_model.model.layers.22.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
651
+ "language_model.model.layers.23.input_layernorm.weight": "model-00006-of-00007.safetensors",
652
+ "language_model.model.layers.23.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
653
+ "language_model.model.layers.23.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
654
+ "language_model.model.layers.23.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
655
+ "language_model.model.layers.23.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
656
+ "language_model.model.layers.23.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
657
+ "language_model.model.layers.23.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
658
+ "language_model.model.layers.23.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
659
+ "language_model.model.layers.23.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
660
+ "language_model.model.layers.23.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
661
+ "language_model.model.layers.23.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
662
+ "language_model.model.layers.24.input_layernorm.weight": "model-00006-of-00007.safetensors",
663
+ "language_model.model.layers.24.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
664
+ "language_model.model.layers.24.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
665
+ "language_model.model.layers.24.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
666
+ "language_model.model.layers.24.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
667
+ "language_model.model.layers.24.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
668
+ "language_model.model.layers.24.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
669
+ "language_model.model.layers.24.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
670
+ "language_model.model.layers.24.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
671
+ "language_model.model.layers.24.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
672
+ "language_model.model.layers.24.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
673
+ "language_model.model.layers.25.input_layernorm.weight": "model-00006-of-00007.safetensors",
674
+ "language_model.model.layers.25.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
675
+ "language_model.model.layers.25.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
676
+ "language_model.model.layers.25.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
677
+ "language_model.model.layers.25.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
678
+ "language_model.model.layers.25.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
679
+ "language_model.model.layers.25.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
680
+ "language_model.model.layers.25.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
681
+ "language_model.model.layers.25.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
682
+ "language_model.model.layers.25.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
683
+ "language_model.model.layers.25.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
684
+ "language_model.model.layers.26.input_layernorm.weight": "model-00006-of-00007.safetensors",
685
+ "language_model.model.layers.26.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
686
+ "language_model.model.layers.26.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
687
+ "language_model.model.layers.26.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
688
+ "language_model.model.layers.26.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
689
+ "language_model.model.layers.26.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
690
+ "language_model.model.layers.26.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
691
+ "language_model.model.layers.26.self_attn.o_proj.weight": "model-00006-of-00007.safetensors",
692
+ "language_model.model.layers.26.self_attn.q_norm.weight": "model-00006-of-00007.safetensors",
693
+ "language_model.model.layers.26.self_attn.q_proj.weight": "model-00006-of-00007.safetensors",
694
+ "language_model.model.layers.26.self_attn.v_proj.weight": "model-00006-of-00007.safetensors",
695
+ "language_model.model.layers.27.input_layernorm.weight": "model-00006-of-00007.safetensors",
696
+ "language_model.model.layers.27.mlp.down_proj.weight": "model-00006-of-00007.safetensors",
697
+ "language_model.model.layers.27.mlp.gate_proj.weight": "model-00006-of-00007.safetensors",
698
+ "language_model.model.layers.27.mlp.up_proj.weight": "model-00006-of-00007.safetensors",
699
+ "language_model.model.layers.27.post_attention_layernorm.weight": "model-00006-of-00007.safetensors",
700
+ "language_model.model.layers.27.self_attn.k_norm.weight": "model-00006-of-00007.safetensors",
701
+ "language_model.model.layers.27.self_attn.k_proj.weight": "model-00006-of-00007.safetensors",
702
+ "language_model.model.layers.27.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
703
+ "language_model.model.layers.27.self_attn.q_norm.weight": "model-00007-of-00007.safetensors",
704
+ "language_model.model.layers.27.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
705
+ "language_model.model.layers.27.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
706
+ "language_model.model.layers.28.input_layernorm.weight": "model-00007-of-00007.safetensors",
707
+ "language_model.model.layers.28.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
708
+ "language_model.model.layers.28.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
709
+ "language_model.model.layers.28.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
710
+ "language_model.model.layers.28.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
711
+ "language_model.model.layers.28.self_attn.k_norm.weight": "model-00007-of-00007.safetensors",
712
+ "language_model.model.layers.28.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
713
+ "language_model.model.layers.28.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
714
+ "language_model.model.layers.28.self_attn.q_norm.weight": "model-00007-of-00007.safetensors",
715
+ "language_model.model.layers.28.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
716
+ "language_model.model.layers.28.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
717
+ "language_model.model.layers.29.input_layernorm.weight": "model-00007-of-00007.safetensors",
718
+ "language_model.model.layers.29.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
719
+ "language_model.model.layers.29.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
720
+ "language_model.model.layers.29.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
721
+ "language_model.model.layers.29.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
722
+ "language_model.model.layers.29.self_attn.k_norm.weight": "model-00007-of-00007.safetensors",
723
+ "language_model.model.layers.29.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
724
+ "language_model.model.layers.29.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
725
+ "language_model.model.layers.29.self_attn.q_norm.weight": "model-00007-of-00007.safetensors",
726
+ "language_model.model.layers.29.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
727
+ "language_model.model.layers.29.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
728
+ "language_model.model.layers.30.input_layernorm.weight": "model-00007-of-00007.safetensors",
729
+ "language_model.model.layers.30.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
730
+ "language_model.model.layers.30.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
731
+ "language_model.model.layers.30.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
732
+ "language_model.model.layers.30.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
733
+ "language_model.model.layers.30.self_attn.k_norm.weight": "model-00007-of-00007.safetensors",
734
+ "language_model.model.layers.30.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
735
+ "language_model.model.layers.30.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
736
+ "language_model.model.layers.30.self_attn.q_norm.weight": "model-00007-of-00007.safetensors",
737
+ "language_model.model.layers.30.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
738
+ "language_model.model.layers.30.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
739
+ "language_model.model.layers.31.input_layernorm.weight": "model-00007-of-00007.safetensors",
740
+ "language_model.model.layers.31.mlp.down_proj.weight": "model-00007-of-00007.safetensors",
741
+ "language_model.model.layers.31.mlp.gate_proj.weight": "model-00007-of-00007.safetensors",
742
+ "language_model.model.layers.31.mlp.up_proj.weight": "model-00007-of-00007.safetensors",
743
+ "language_model.model.layers.31.post_attention_layernorm.weight": "model-00007-of-00007.safetensors",
744
+ "language_model.model.layers.31.self_attn.k_norm.weight": "model-00007-of-00007.safetensors",
745
+ "language_model.model.layers.31.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
746
+ "language_model.model.layers.31.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
747
+ "language_model.model.layers.31.self_attn.q_norm.weight": "model-00007-of-00007.safetensors",
748
+ "language_model.model.layers.31.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
749
+ "language_model.model.layers.31.self_attn.v_proj.weight": "model-00007-of-00007.safetensors",
750
+ "language_model.model.layers.32.self_attn.k_norm.weight": "model-00007-of-00007.safetensors",
751
+ "language_model.model.layers.32.self_attn.k_proj.weight": "model-00007-of-00007.safetensors",
752
+ "language_model.model.layers.32.self_attn.o_proj.weight": "model-00007-of-00007.safetensors",
753
+ "language_model.model.layers.32.self_attn.q_norm.weight": "model-00007-of-00007.safetensors",
754
+ "language_model.model.layers.32.self_attn.q_proj.weight": "model-00007-of-00007.safetensors",
755
+ "language_model.model.layers.32.self_attn.v_proj.weight": "model-00007-of-00007.safetensors"
756
+ }
757
+ }
preprocessor_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": null,
3
+ "data_format": "channels_first",
4
+ "default_to_square": true,
5
+ "device": null,
6
+ "disable_grouping": null,
7
+ "do_center_crop": null,
8
+ "do_convert_rgb": true,
9
+ "do_normalize": true,
10
+ "do_pad": null,
11
+ "do_rescale": true,
12
+ "do_resize": true,
13
+ "image_mean": [
14
+ 0.5,
15
+ 0.5,
16
+ 0.5
17
+ ],
18
+ "image_processor_type": "Qwen2VLImageProcessorFast",
19
+ "image_std": [
20
+ 0.5,
21
+ 0.5,
22
+ 0.5
23
+ ],
24
+ "input_data_format": null,
25
+ "max_pixels": null,
26
+ "merge_size": 2,
27
+ "min_pixels": null,
28
+ "pad_size": null,
29
+ "patch_size": 16,
30
+ "processor_class": "Qwen3VLProcessor",
31
+ "resample": 3,
32
+ "rescale_factor": 0.00392156862745098,
33
+ "return_tensors": null,
34
+ "size": {
35
+ "longest_edge": 16777216,
36
+ "shortest_edge": 65536
37
+ },
38
+ "temporal_patch_size": 2
39
+ }
projections/projection_128d.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b1c953b2e215a6d75d5c43cd8f4f3776bf955c81bddd195ff73ae35512bc099
3
+ size 5244352
projections/projection_320d.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:809888951c0e8047a7162fc4ac1d57a0c1624cf756152bee08b6184eb6e2bfec
3
+ size 2622224
scripts/colqwen3_embedder.py ADDED
@@ -0,0 +1,478 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ColQwen3 MLX Embedder
3
+
4
+ Production-ready multimodal document embedding using Tomoro-ColQwen3 on MLX.
5
+ Provides ColPali-style multi-vector embeddings for visual document retrieval.
6
+
7
+ Key insight: For proper image embeddings, <|image_pad|> tokens must be expanded
8
+ to match the number of vision patches, and only image token embeddings should
9
+ be used for MaxSim scoring.
10
+
11
+ Created by M&K (c)2025 The LibraxisAI Team
12
+ Co-Authored-By: Maciej (void@div0.space) & Klaudiusz (the1st@whoai.am)
13
+ """
14
+
15
+ import os
16
+ from dataclasses import dataclass
17
+ from pathlib import Path
18
+ from typing import List, Optional, Tuple, Union
19
+
20
+ import mlx.core as mx
21
+ import numpy as np
22
+ from PIL import Image
23
+
24
+ # Special token ID for image patches
25
+ IMAGE_PAD_TOKEN = 151655
26
+
27
+
28
+ @dataclass
29
+ class EmbeddingResult:
30
+ """Result of embedding operation."""
31
+
32
+ embeddings: mx.array # [num_tokens, 320]
33
+ num_tokens: int
34
+ source_type: str # "text" or "image"
35
+
36
+
37
+ class ColQwen3Embedder:
38
+ """
39
+ ColQwen3 document embedder using MLX.
40
+
41
+ Provides multi-vector embeddings optimized for document retrieval
42
+ using Late Interaction (MaxSim) scoring.
43
+
44
+ Environment Variables:
45
+ COLQWEN3_MODEL_PATH: Path to Tomoro-ColQwen3 MLX model directory.
46
+ Default: /Volumes/Maciejowe/mlx_lm/models/tomoro-colqwen3-8b-mlx
47
+ COLQWEN3_PROJECTION_PATH: Path to embedding projection weights (.safetensors).
48
+ Default: /Volumes/Maciejowe/mlx_lm/models/colqwen3_projection.safetensors
49
+
50
+ Usage:
51
+ # Option 1: Set environment variables
52
+ export COLQWEN3_MODEL_PATH="/path/to/tomoro-colqwen3-8b-mlx"
53
+ export COLQWEN3_PROJECTION_PATH="/path/to/colqwen3_projection.safetensors"
54
+
55
+ embedder = ColQwen3Embedder()
56
+ embedder.load()
57
+
58
+ # Option 2: Pass paths directly (overrides env vars)
59
+ embedder = ColQwen3Embedder(
60
+ model_path="/path/to/model",
61
+ projection_path="/path/to/projection.safetensors"
62
+ )
63
+ embedder.load()
64
+
65
+ # Embed a document image
66
+ doc_emb = embedder.embed_image("document.png")
67
+
68
+ # Embed a text query
69
+ query_emb = embedder.embed_text("search query")
70
+
71
+ # Score relevance
72
+ score = embedder.maxsim_score(query_emb, doc_emb)
73
+
74
+ Created by M&K (c)2025 The LibraxisAI Team
75
+ """
76
+
77
+ # Environment variable names for configuration
78
+ ENV_MODEL_PATH = "COLQWEN3_MODEL_PATH"
79
+ ENV_PROJECTION_PATH = "COLQWEN3_PROJECTION_PATH"
80
+
81
+ # Default paths (backward compatibility with existing setup)
82
+ DEFAULT_MODEL_PATH = "/Volumes/Maciejowe/mlx_lm/models/tomoro-colqwen3-8b-mlx"
83
+ DEFAULT_PROJ_PATH = "/Volumes/Maciejowe/mlx_lm/models/colqwen3_projection.safetensors"
84
+
85
+ def __init__(
86
+ self,
87
+ model_path: Optional[str] = None,
88
+ projection_path: Optional[str] = None,
89
+ embedding_dim: int = 320,
90
+ ):
91
+ """
92
+ Initialize the embedder.
93
+
94
+ Args:
95
+ model_path: Path to Tomoro-ColQwen3 MLX model (overrides env var)
96
+ projection_path: Path to embedding projection weights (overrides env var)
97
+ embedding_dim: Output embedding dimension (default 320)
98
+
99
+ Path resolution order:
100
+ 1. Explicitly passed argument
101
+ 2. Environment variable (COLQWEN3_MODEL_PATH / COLQWEN3_PROJECTION_PATH)
102
+ 3. Default fallback path
103
+ """
104
+ self.model_path = model_path or os.environ.get(self.ENV_MODEL_PATH) or self.DEFAULT_MODEL_PATH
105
+ self.projection_path = projection_path or os.environ.get(self.ENV_PROJECTION_PATH) or self.DEFAULT_PROJ_PATH
106
+ self.embedding_dim = embedding_dim
107
+
108
+ self.model = None
109
+ self.mlx_processor = None
110
+ self.tomoro_processor = None
111
+ self.proj_weight = None
112
+ self.proj_bias = None
113
+ self._loaded = False
114
+
115
+ def load(self) -> None:
116
+ """Load model, processor, and projection weights."""
117
+ if self._loaded:
118
+ return
119
+
120
+ from mlx_vlm import load
121
+ from safetensors.torch import load_file
122
+ from transformers import AutoProcessor
123
+
124
+ print(f"Loading ColQwen3 from {self.model_path}...")
125
+ self.model, self.mlx_processor = load(self.model_path)
126
+
127
+ # Load Tomoro processor for proper image token expansion
128
+ print("Loading Tomoro processor for image token expansion...")
129
+ self.tomoro_processor = AutoProcessor.from_pretrained(
130
+ "TomoroAI/tomoro-colqwen3-embed-8b", trust_remote_code=True
131
+ )
132
+
133
+ print(f"Loading projection from {self.projection_path}...")
134
+ proj_weights = load_file(self.projection_path)
135
+ self.proj_weight = mx.array(proj_weights["embedding_proj_layer.weight"].float().numpy())
136
+ self.proj_bias = mx.array(proj_weights["embedding_proj_layer.bias"].float().numpy())
137
+
138
+ self._loaded = True
139
+ print("ColQwen3 Embedder ready!")
140
+
141
+ def _ensure_loaded(self) -> None:
142
+ """Ensure model is loaded."""
143
+ if not self._loaded:
144
+ self.load()
145
+
146
+ def _project_and_normalize(self, hidden_states: mx.array) -> mx.array:
147
+ """Apply projection layer and L2 normalize."""
148
+ # Project to embedding dimension
149
+ embeddings = hidden_states @ self.proj_weight.T + self.proj_bias
150
+
151
+ # L2 normalize
152
+ norm = mx.sqrt(mx.sum(embeddings**2, axis=-1, keepdims=True) + 1e-12)
153
+ embeddings = embeddings / norm
154
+
155
+ return embeddings
156
+
157
+ def embed_text(self, text: str) -> EmbeddingResult:
158
+ """
159
+ Embed text query.
160
+
161
+ Args:
162
+ text: Query string
163
+
164
+ Returns:
165
+ EmbeddingResult with shape [num_tokens, 320]
166
+ """
167
+ self._ensure_loaded()
168
+
169
+ # Get inner language model (skips lm_head)
170
+ inner_model = self.model["language_model"]["model"]
171
+
172
+ # Tokenize using Tomoro processor for consistency
173
+ inputs = self.tomoro_processor.tokenizer(text, return_tensors="np")
174
+ input_ids = mx.array(inputs["input_ids"])
175
+ batch_size, seq_len = input_ids.shape
176
+
177
+ # Create position IDs for M-ROPE
178
+ position_ids = mx.arange(seq_len).reshape(1, -1)
179
+ position_ids = mx.broadcast_to(position_ids, (batch_size, seq_len))
180
+ position_ids = mx.broadcast_to(position_ids[None, ...], (3, batch_size, seq_len))
181
+
182
+ # Get hidden states
183
+ hidden_states = inner_model(input_ids, position_ids=position_ids)
184
+
185
+ # Project and normalize
186
+ embeddings = self._project_and_normalize(hidden_states)
187
+ embeddings = embeddings.squeeze(0) # Remove batch dim
188
+ mx.eval(embeddings)
189
+
190
+ return EmbeddingResult(
191
+ embeddings=embeddings,
192
+ num_tokens=seq_len,
193
+ source_type="text",
194
+ )
195
+
196
+ def embed_image(
197
+ self,
198
+ image: Union[str, Path, Image.Image],
199
+ ) -> EmbeddingResult:
200
+ """
201
+ Embed document image with proper token expansion.
202
+
203
+ Uses Tomoro's ColQwen3Processor to correctly expand <|image_pad|>
204
+ tokens to match the number of vision patches. Only the image token
205
+ embeddings are returned for MaxSim scoring.
206
+
207
+ Args:
208
+ image: Image path or PIL Image object
209
+
210
+ Returns:
211
+ EmbeddingResult with shape [num_patches, 320]
212
+ """
213
+ self._ensure_loaded()
214
+
215
+ # Load image if path
216
+ if isinstance(image, (str, Path)):
217
+ image = Image.open(image).convert("RGB")
218
+
219
+ # Process with Tomoro processor (properly expands <|image_pad|>)
220
+ inputs = self.tomoro_processor(
221
+ text="", # No text prompt - only image
222
+ images=[image],
223
+ return_tensors="pt",
224
+ )
225
+
226
+ input_ids = inputs["input_ids"]
227
+ pixel_values = inputs["pixel_values"]
228
+ image_grid_thw = inputs["image_grid_thw"]
229
+
230
+ # Create mask for image tokens
231
+ image_mask = (input_ids == IMAGE_PAD_TOKEN).numpy()[0]
232
+ image_positions = np.where(image_mask)[0].tolist()
233
+
234
+ # Get vision embeddings from vision tower
235
+ pixel_values_mx = mx.array(pixel_values.numpy())
236
+ image_grid_thw_mx = mx.array(image_grid_thw.numpy())
237
+ hidden_states_vision, _ = self.model["vision_tower"](pixel_values_mx, image_grid_thw_mx)
238
+
239
+ # Get text embeddings and inject vision embeddings at image positions
240
+ input_ids_mx = mx.array(input_ids.numpy())
241
+ embed_tokens = self.model["language_model"]["model"]["embed_tokens"]
242
+ text_emb_np = np.array(embed_tokens(input_ids_mx)[0])
243
+ vision_np = np.array(hidden_states_vision)
244
+
245
+ for i, pos in enumerate(image_positions):
246
+ if i < vision_np.shape[0]:
247
+ text_emb_np[pos] = vision_np[i]
248
+
249
+ batch_size, seq_len = input_ids_mx.shape
250
+ combined_embeddings = mx.array(text_emb_np).reshape(1, seq_len, -1)
251
+
252
+ # Create position IDs for M-ROPE
253
+ position_ids = mx.arange(seq_len).reshape(1, -1)
254
+ position_ids = mx.broadcast_to(position_ids, (batch_size, seq_len))
255
+ position_ids = mx.broadcast_to(position_ids[None, ...], (3, batch_size, seq_len))
256
+
257
+ # Forward through language model layers
258
+ inner_model = self.model["language_model"]["model"]
259
+ h = combined_embeddings
260
+ for layer in inner_model["layers"]:
261
+ h = layer(h, position_ids=position_ids)
262
+ h = inner_model["norm"](h)
263
+
264
+ # Extract ONLY image token embeddings for MaxSim
265
+ h_np = np.array(h[0])
266
+ image_hidden_states = mx.array(h_np[image_mask])
267
+
268
+ # Project and normalize
269
+ embeddings = self._project_and_normalize(image_hidden_states)
270
+ mx.eval(embeddings)
271
+
272
+ return EmbeddingResult(
273
+ embeddings=embeddings,
274
+ num_tokens=embeddings.shape[0],
275
+ source_type="image",
276
+ )
277
+
278
+ def embed_pdf_page(
279
+ self,
280
+ pdf_path: Union[str, Path],
281
+ page_num: int = 0,
282
+ dpi: int = 150,
283
+ ) -> EmbeddingResult:
284
+ """
285
+ Embed a page from a PDF document.
286
+
287
+ Args:
288
+ pdf_path: Path to PDF file
289
+ page_num: Page number (0-indexed)
290
+ dpi: Resolution for rendering
291
+
292
+ Returns:
293
+ EmbeddingResult with shape [num_patches, 320]
294
+ """
295
+ try:
296
+ import fitz # PyMuPDF
297
+ except ImportError:
298
+ raise ImportError("PyMuPDF required: pip install pymupdf")
299
+
300
+ doc = fitz.open(pdf_path)
301
+ page = doc.load_page(page_num)
302
+ pix = page.get_pixmap(dpi=dpi)
303
+ image = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
304
+ doc.close()
305
+
306
+ return self.embed_image(image)
307
+
308
+ def embed_pdf(
309
+ self,
310
+ pdf_path: Union[str, Path],
311
+ dpi: int = 150,
312
+ max_pages: Optional[int] = None,
313
+ ) -> List[EmbeddingResult]:
314
+ """
315
+ Embed all pages from a PDF document.
316
+
317
+ Args:
318
+ pdf_path: Path to PDF file
319
+ dpi: Resolution for rendering
320
+ max_pages: Maximum pages to process (None for all)
321
+
322
+ Returns:
323
+ List of EmbeddingResult, one per page
324
+ """
325
+ try:
326
+ import fitz
327
+ except ImportError:
328
+ raise ImportError("PyMuPDF required: pip install pymupdf")
329
+
330
+ doc = fitz.open(pdf_path)
331
+ num_pages = min(len(doc), max_pages) if max_pages else len(doc)
332
+
333
+ results = []
334
+ for i in range(num_pages):
335
+ result = self.embed_pdf_page(pdf_path, page_num=i, dpi=dpi)
336
+ results.append(result)
337
+
338
+ doc.close()
339
+ return results
340
+
341
+ @staticmethod
342
+ def maxsim_score(
343
+ query_emb: Union[mx.array, EmbeddingResult],
344
+ doc_emb: Union[mx.array, EmbeddingResult],
345
+ ) -> float:
346
+ """
347
+ Compute MaxSim score between query and document embeddings.
348
+
349
+ MaxSim (Late Interaction): For each query token, find maximum
350
+ similarity across all document tokens, then sum.
351
+
352
+ Args:
353
+ query_emb: Query embeddings [q_len, dim]
354
+ doc_emb: Document embeddings [d_len, dim]
355
+
356
+ Returns:
357
+ Similarity score (higher = more relevant)
358
+ """
359
+ if isinstance(query_emb, EmbeddingResult):
360
+ query_emb = query_emb.embeddings
361
+ if isinstance(doc_emb, EmbeddingResult):
362
+ doc_emb = doc_emb.embeddings
363
+
364
+ # Compute all pairwise similarities: [q_len, d_len]
365
+ similarities = query_emb @ doc_emb.T
366
+
367
+ # For each query token, take max over document tokens
368
+ max_sims = mx.max(similarities, axis=1)
369
+
370
+ # Sum across query tokens
371
+ score = mx.sum(max_sims)
372
+ mx.eval(score)
373
+
374
+ return float(score)
375
+
376
+ @staticmethod
377
+ def cosine_similarity(
378
+ emb1: Union[mx.array, EmbeddingResult],
379
+ emb2: Union[mx.array, EmbeddingResult],
380
+ ) -> float:
381
+ """
382
+ Compute mean-pooled cosine similarity.
383
+
384
+ Args:
385
+ emb1: First embeddings [n, dim]
386
+ emb2: Second embeddings [m, dim]
387
+
388
+ Returns:
389
+ Cosine similarity in [-1, 1]
390
+ """
391
+ if isinstance(emb1, EmbeddingResult):
392
+ emb1 = emb1.embeddings
393
+ if isinstance(emb2, EmbeddingResult):
394
+ emb2 = emb2.embeddings
395
+
396
+ # Mean pool
397
+ v1 = mx.mean(emb1, axis=0)
398
+ v2 = mx.mean(emb2, axis=0)
399
+
400
+ # Cosine similarity
401
+ sim = mx.sum(v1 * v2) / (mx.sqrt(mx.sum(v1**2)) * mx.sqrt(mx.sum(v2**2)))
402
+ mx.eval(sim)
403
+
404
+ return float(sim)
405
+
406
+ def rank_documents(
407
+ self,
408
+ query: str,
409
+ documents: List[EmbeddingResult],
410
+ top_k: Optional[int] = None,
411
+ ) -> List[Tuple[int, float]]:
412
+ """
413
+ Rank documents by relevance to query.
414
+
415
+ Args:
416
+ query: Query string
417
+ documents: List of document embeddings
418
+ top_k: Return top K results (None for all)
419
+
420
+ Returns:
421
+ List of (doc_index, score) sorted by descending score
422
+ """
423
+ query_emb = self.embed_text(query)
424
+
425
+ scores = []
426
+ for i, doc_emb in enumerate(documents):
427
+ score = self.maxsim_score(query_emb, doc_emb)
428
+ scores.append((i, score))
429
+
430
+ # Sort by score descending
431
+ scores.sort(key=lambda x: x[1], reverse=True)
432
+
433
+ if top_k:
434
+ scores = scores[:top_k]
435
+
436
+ return scores
437
+
438
+ def to_numpy(self, emb: Union[mx.array, EmbeddingResult]) -> np.ndarray:
439
+ """Convert embeddings to numpy array (for storage/indexing)."""
440
+ if isinstance(emb, EmbeddingResult):
441
+ emb = emb.embeddings
442
+ return np.array(emb)
443
+
444
+
445
+ # Convenience functions
446
+ def load_embedder(
447
+ model_path: Optional[str] = None,
448
+ projection_path: Optional[str] = None,
449
+ ) -> ColQwen3Embedder:
450
+ """Load and return a ready-to-use embedder."""
451
+ embedder = ColQwen3Embedder(
452
+ model_path=model_path,
453
+ projection_path=projection_path,
454
+ )
455
+ embedder.load()
456
+ return embedder
457
+
458
+
459
+ if __name__ == "__main__":
460
+ # Quick test
461
+ print("Testing ColQwen3 Embedder...")
462
+
463
+ embedder = load_embedder()
464
+
465
+ # Test text embedding
466
+ text = "dawkowanie meloksykamu dla psa"
467
+ result = embedder.embed_text(text)
468
+ print(f"\nText: '{text}'")
469
+ print(f" Tokens: {result.num_tokens}")
470
+ print(f" Embedding shape: {result.embeddings.shape}")
471
+
472
+ # Test text similarity
473
+ text2 = "metacam dose for dogs"
474
+ result2 = embedder.embed_text(text2)
475
+ sim = embedder.cosine_similarity(result, result2)
476
+ print(f"\nSimilarity to '{text2}': {sim:.4f}")
477
+
478
+ print("\nColQwen3 Embedder test complete!")
scripts/mlx_visual_server.py ADDED
@@ -0,0 +1,318 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ MLX Visual Embedding Server - ColQwen3
4
+
5
+ HTTP server wrapper for ColQwen3Embedder providing visual document embeddings.
6
+ Power of Wet Coders edition - custom merged model by LibraxisAI.
7
+
8
+ Uses the production ColQwen3Embedder class from colqwen3_embedder.py
9
+
10
+ Usage:
11
+ cd knowledge/vista-brain
12
+ uv run python scripts/mlx_visual_server.py
13
+
14
+ # Or via Makefile:
15
+ make visual
16
+
17
+ Endpoints:
18
+ POST /v1/visual-embeddings - Generate visual embeddings from images/PDFs
19
+ POST /v1/maxsim - Compute MaxSim score between query and docs
20
+ GET /v1/models - List models
21
+ GET /health - Health check
22
+
23
+ Created by M&K (c)2025 The LibraxisAI Team
24
+ Co-Authored-By: Maciej (void@div0.space) & Klaudiusz (the1st@whoai.am)
25
+ """
26
+ import base64
27
+ import io
28
+ import json
29
+ import os
30
+ import sys
31
+ import time
32
+ from http.server import BaseHTTPRequestHandler, HTTPServer
33
+ from pathlib import Path
34
+ from typing import List, Union
35
+
36
+ # Add parent directory to path for colqwen3_embedder import
37
+ sys.path.insert(0, str(Path(__file__).parent.parent))
38
+
39
+ from colqwen3_embedder import ColQwen3Embedder, load_embedder
40
+
41
+ # Configuration from environment
42
+ PORT = int(os.environ.get("MLX_VISUAL_PORT", "12347"))
43
+
44
+ # ColBERT embedding dimension (320 for our custom projection)
45
+ EMBED_DIM = 320
46
+
47
+ # Lazy load embedder
48
+ _embedder = None
49
+
50
+
51
+ def get_embedder() -> ColQwen3Embedder:
52
+ """Lazy load the ColQwen3 embedder."""
53
+ global _embedder
54
+ if _embedder is None:
55
+ print("Loading ColQwen3 Embedder...", file=sys.stderr)
56
+ _embedder = load_embedder()
57
+ print(f"ColQwen3 ready (dim={EMBED_DIM})", file=sys.stderr)
58
+ return _embedder
59
+
60
+
61
+ def decode_image(image_data: Union[str, bytes]):
62
+ """Decode image from base64 or bytes."""
63
+ from PIL import Image
64
+
65
+ if isinstance(image_data, str):
66
+ # Handle base64 with or without data URL prefix
67
+ if image_data.startswith("data:"):
68
+ # data:image/png;base64,xxxx
69
+ image_data = image_data.split(",", 1)[1]
70
+ image_bytes = base64.b64decode(image_data)
71
+ else:
72
+ image_bytes = image_data
73
+
74
+ return Image.open(io.BytesIO(image_bytes)).convert("RGB")
75
+
76
+
77
+ def embed_images(images: List[Union[str, bytes]]) -> List[dict]:
78
+ """Generate ColBERT-style embeddings for images."""
79
+ embedder = get_embedder()
80
+ import mlx.core as mx
81
+
82
+ results = []
83
+ for img_data in images:
84
+ try:
85
+ # Decode image
86
+ if isinstance(img_data, str) and (
87
+ img_data.startswith("/") or img_data.startswith(".")
88
+ ):
89
+ # It's a file path
90
+ pil_img = img_data
91
+ else:
92
+ # Base64 data
93
+ pil_img = decode_image(img_data)
94
+
95
+ # Embed using ColQwen3Embedder
96
+ result = embedder.embed_image(pil_img)
97
+
98
+ results.append({
99
+ "embedding": embedder.to_numpy(result).tolist(),
100
+ "num_tokens": result.num_tokens,
101
+ "source_type": result.source_type,
102
+ })
103
+
104
+ except Exception as e:
105
+ print(f"Image embed error: {e}", file=sys.stderr)
106
+ results.append({"error": str(e)})
107
+
108
+ # Clear MLX cache
109
+ mx.clear_cache()
110
+
111
+ return results
112
+
113
+
114
+ def embed_pdf(pdf_path: str, max_pages: int = None) -> List[dict]:
115
+ """Embed all pages from a PDF."""
116
+ embedder = get_embedder()
117
+ import mlx.core as mx
118
+
119
+ results = []
120
+ try:
121
+ page_results = embedder.embed_pdf(pdf_path, max_pages=max_pages)
122
+ for i, result in enumerate(page_results):
123
+ results.append({
124
+ "page": i,
125
+ "embedding": embedder.to_numpy(result).tolist(),
126
+ "num_tokens": result.num_tokens,
127
+ "source_type": result.source_type,
128
+ })
129
+ except Exception as e:
130
+ print(f"PDF embed error: {e}", file=sys.stderr)
131
+ results.append({"error": str(e)})
132
+
133
+ mx.clear_cache()
134
+ return results
135
+
136
+
137
+ def embed_text(text: str) -> dict:
138
+ """Embed text query."""
139
+ embedder = get_embedder()
140
+ import mlx.core as mx
141
+
142
+ try:
143
+ result = embedder.embed_text(text)
144
+ mx.clear_cache()
145
+ return {
146
+ "embedding": embedder.to_numpy(result).tolist(),
147
+ "num_tokens": result.num_tokens,
148
+ "source_type": result.source_type,
149
+ }
150
+ except Exception as e:
151
+ print(f"Text embed error: {e}", file=sys.stderr)
152
+ return {"error": str(e)}
153
+
154
+
155
+ def compute_maxsim(query_embedding: List, doc_embedding: List) -> float:
156
+ """Compute MaxSim score between query and document embeddings."""
157
+ import mlx.core as mx
158
+
159
+ query_mx = mx.array(query_embedding)
160
+ doc_mx = mx.array(doc_embedding)
161
+
162
+ # MaxSim: for each query token, max over doc tokens, then sum
163
+ similarities = query_mx @ doc_mx.T
164
+ max_sims = mx.max(similarities, axis=1)
165
+ score = float(mx.sum(max_sims))
166
+
167
+ mx.clear_cache()
168
+ return score
169
+
170
+
171
+ class VisualHandler(BaseHTTPRequestHandler):
172
+ """HTTP handler for visual embeddings API."""
173
+
174
+ def log_message(self, format, *args):
175
+ """Log to stderr."""
176
+ print(f"[{time.strftime('%Y-%m-%d %H:%M:%S')}] {args[0]}", file=sys.stderr)
177
+
178
+ def send_json(self, data: dict, status: int = 200):
179
+ """Send JSON response."""
180
+ body = json.dumps(data).encode("utf-8")
181
+ self.send_response(status)
182
+ self.send_header("Content-Type", "application/json")
183
+ self.send_header("Content-Length", len(body))
184
+ self.end_headers()
185
+ self.wfile.write(body)
186
+
187
+ def do_GET(self):
188
+ """Handle GET requests."""
189
+ if self.path == "/v1/models" or self.path == "/models":
190
+ self.send_json({
191
+ "object": "list",
192
+ "data": [{
193
+ "id": "colqwen3-8b-wetcoders",
194
+ "object": "model",
195
+ "owned_by": "libraxis-local",
196
+ "type": "visual-embedding",
197
+ "description": "ColQwen3 8B - Power of Wet Coders edition",
198
+ "embedding_dim": EMBED_DIM,
199
+ }]
200
+ })
201
+ elif self.path == "/health":
202
+ self.send_json({
203
+ "status": "healthy",
204
+ "model": "colqwen3-8b-wetcoders",
205
+ "dim": EMBED_DIM,
206
+ "type": "colbert-visual-embedding",
207
+ })
208
+ else:
209
+ self.send_json({"error": "Not found"}, 404)
210
+
211
+ def do_POST(self):
212
+ """Handle POST requests."""
213
+ content_length = int(self.headers.get("Content-Length", 0))
214
+ body = self.rfile.read(content_length)
215
+
216
+ try:
217
+ data = json.loads(body)
218
+ except json.JSONDecodeError:
219
+ self.send_json({"error": "Invalid JSON"}, 400)
220
+ return
221
+
222
+ if self.path in ["/v1/visual-embeddings", "/visual-embeddings"]:
223
+ self._handle_embeddings(data)
224
+ elif self.path in ["/v1/maxsim", "/maxsim"]:
225
+ self._handle_maxsim(data)
226
+ else:
227
+ self.send_json({"error": "Not found"}, 404)
228
+
229
+ def _handle_embeddings(self, data: dict):
230
+ """Handle embedding requests."""
231
+ images = data.get("images", [])
232
+ texts = data.get("texts", [])
233
+ pdf_path = data.get("pdf_path")
234
+ max_pages = data.get("max_pages")
235
+
236
+ response = {
237
+ "object": "embedding_response",
238
+ "model": "colqwen3-8b-wetcoders",
239
+ "dim": EMBED_DIM,
240
+ }
241
+
242
+ try:
243
+ if pdf_path:
244
+ # PDF embedding
245
+ response["pdf_embeddings"] = embed_pdf(pdf_path, max_pages)
246
+ elif images:
247
+ # Image embeddings
248
+ response["image_embeddings"] = embed_images(images)
249
+ elif texts:
250
+ # Text embeddings
251
+ response["text_embeddings"] = [embed_text(t) for t in texts]
252
+ else:
253
+ self.send_json({"error": "No images, texts, or pdf_path provided"}, 400)
254
+ return
255
+
256
+ except Exception as e:
257
+ print(f"Embedding error: {e}", file=sys.stderr)
258
+ self.send_json({"error": str(e)}, 500)
259
+ return
260
+
261
+ self.send_json(response)
262
+
263
+ def _handle_maxsim(self, data: dict):
264
+ """Handle MaxSim scoring requests."""
265
+ query_embedding = data.get("query_embedding")
266
+ doc_embedding = data.get("doc_embedding")
267
+
268
+ if not query_embedding or not doc_embedding:
269
+ self.send_json({"error": "query_embedding and doc_embedding required"}, 400)
270
+ return
271
+
272
+ try:
273
+ score = compute_maxsim(query_embedding, doc_embedding)
274
+ self.send_json({
275
+ "object": "maxsim_score",
276
+ "score": score,
277
+ "model": "colqwen3-8b-wetcoders",
278
+ })
279
+ except Exception as e:
280
+ print(f"MaxSim error: {e}", file=sys.stderr)
281
+ self.send_json({"error": str(e)}, 500)
282
+
283
+
284
+ def main():
285
+ """Start the visual embedding server."""
286
+ print("", file=sys.stderr)
287
+ print("=" * 60, file=sys.stderr)
288
+ print("MLX Visual Embedding Server - ColQwen3", file=sys.stderr)
289
+ print("Power of Wet Coders Edition", file=sys.stderr)
290
+ print("=" * 60, file=sys.stderr)
291
+ print(f"Port: {PORT}", file=sys.stderr)
292
+ print(f"Embedding dim: {EMBED_DIM} (ColBERT)", file=sys.stderr)
293
+ print("", file=sys.stderr)
294
+ print("Endpoints:", file=sys.stderr)
295
+ print(" POST /v1/visual-embeddings - Generate embeddings", file=sys.stderr)
296
+ print(" body: {images: [base64...]} or {pdf_path: '/path.pdf'}", file=sys.stderr)
297
+ print(" POST /v1/maxsim - Compute MaxSim score", file=sys.stderr)
298
+ print(" body: {query_embedding: [...], doc_embedding: [...]}", file=sys.stderr)
299
+ print(" GET /v1/models - List models", file=sys.stderr)
300
+ print(" GET /health - Health check", file=sys.stderr)
301
+ print("", file=sys.stderr)
302
+
303
+ # Pre-load embedder
304
+ get_embedder()
305
+
306
+ server = HTTPServer(("0.0.0.0", PORT), VisualHandler)
307
+ print(f"Server ready at http://localhost:{PORT}", file=sys.stderr)
308
+ print("=" * 60, file=sys.stderr)
309
+
310
+ try:
311
+ server.serve_forever()
312
+ except KeyboardInterrupt:
313
+ print("\nShutting down...", file=sys.stderr)
314
+ server.shutdown()
315
+
316
+
317
+ if __name__ == "__main__":
318
+ main()
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
3
+ size 11422654
tokenizer_config.json ADDED
@@ -0,0 +1,239 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "clean_up_tokenization_spaces": false,
231
+ "eos_token": "<|im_end|>",
232
+ "errors": "replace",
233
+ "extra_special_tokens": {},
234
+ "model_max_length": 262144,
235
+ "pad_token": "<|endoftext|>",
236
+ "split_special_tokens": false,
237
+ "tokenizer_class": "Qwen2Tokenizer",
238
+ "unk_token": null
239
+ }
video_preprocessor_config.json ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": null,
3
+ "data_format": "channels_first",
4
+ "default_to_square": true,
5
+ "device": null,
6
+ "do_center_crop": null,
7
+ "do_convert_rgb": true,
8
+ "do_normalize": true,
9
+ "do_rescale": true,
10
+ "do_resize": true,
11
+ "do_sample_frames": true,
12
+ "fps": 2,
13
+ "image_mean": [
14
+ 0.5,
15
+ 0.5,
16
+ 0.5
17
+ ],
18
+ "image_processor_type": "Qwen2VLImageProcessorFast",
19
+ "image_std": [
20
+ 0.5,
21
+ 0.5,
22
+ 0.5
23
+ ],
24
+ "input_data_format": null,
25
+ "max_frames": 768,
26
+ "merge_size": 2,
27
+ "min_frames": 4,
28
+ "num_frames": null,
29
+ "pad_size": null,
30
+ "patch_size": 16,
31
+ "resample": 3,
32
+ "rescale_factor": 0.00392156862745098,
33
+ "return_metadata": false,
34
+ "size": {
35
+ "longest_edge": 16777216,
36
+ "shortest_edge": 65536
37
+ },
38
+ "temporal_patch_size": 2,
39
+ "video_metadata": null,
40
+ "video_processor_type": "Qwen3VLVideoProcessor"
41
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff