Instructions to use KRLabsOrg/squeez-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use KRLabsOrg/squeez-2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="KRLabsOrg/squeez-2b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("KRLabsOrg/squeez-2b")
model = AutoModelForImageTextToText.from_pretrained("KRLabsOrg/squeez-2b")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use KRLabsOrg/squeez-2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "KRLabsOrg/squeez-2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KRLabsOrg/squeez-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/KRLabsOrg/squeez-2b

SGLang

How to use KRLabsOrg/squeez-2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "KRLabsOrg/squeez-2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KRLabsOrg/squeez-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "KRLabsOrg/squeez-2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "KRLabsOrg/squeez-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use KRLabsOrg/squeez-2b with Docker Model Runner:
```
docker model run hf.co/KRLabsOrg/squeez-2b
```

adaamko commited on Mar 16

Commit

51b4c8b

verified ·

1 Parent(s): a8a8e5e

Upload folder using huggingface_hub

Browse files

Files changed (9) hide show

.gitattributes +1 -0
README.md +290 -0
chat_template.jinja +154 -0
config.json +103 -0
model.safetensors-00001-of-00001.safetensors +3 -0
model.safetensors.index.json +639 -0
preprocessor_config.json +21 -0
tokenizer.json +3 -0
tokenizer_config.json +32 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,290 @@

+---
+license: apache-2.0
+language:
+- en
+base_model: Qwen/Qwen3.5-2B
+tags:
+- tool-output-pruning
+- context-engineering
+- context-pruning
+- code-agent
+- squeez
+- qwen3.5
+pipeline_tag: text-generation
+library_name: transformers
+datasets:
+- KRLabsOrg/tool-output-extraction-swebench
+---
+<p align="center">
+  <img src="https://github.com/KRLabsOrg/squeez/blob/main/assets/squeez_mascot.png?raw=true" alt="Squeez" width="250"/>
+  <br><em>Squeeze out the juice, leave the pulp behind.</em>
+</p>
+# Squeez-2B
+LLM coding agents spend **80-95% of their context window** on irrelevant tool output — passing test names, boilerplate headers, unchanged files. Squeez reads the raw output alongside a task description and returns **only the lines the agent needs to read next**, compressing tool output by ~91% on average while keeping 86% of the relevant information.
+Unlike keyword search (BM25) or generic semantic highlighting, Squeez is trained specifically on tool output from real software engineering workflows — test logs, grep results, build errors, git diffs, stack traces, and more.
+## What is Squeez?
+Squeez is a **tool output pruner for coding agents**. When an agent runs a tool (pytest, grep, git log, npm build, kubectl, etc.), the output is often hundreds of lines — but only a handful matter for the current task. Squeez acts as a filter between the tool and the agent's context window:
+```
+Tool output (500 lines) → Squeez → Relevant lines (30 lines) → Agent context
+```
+This model (Squeez-2B) is a generative approach: [Qwen 3.5 2B](https://huggingface.co/Qwen/Qwen3.5-2B) fine-tuned to extract verbatim relevant lines from tool output, given a task-specific query.
+### Why a small fine-tuned model?
+- **Fast**: 2B parameters — runs on a single GPU or even CPU, serves via vLLM at high throughput
+- **Accurate**: Outperforms a 35B MoE model (Qwen 3.5 35B A3B) at zero-shot by **+13% Span F1**
+- **Faithful**: Returns verbatim lines only — no rewriting, no hallucination, no summarization
+- **Drop-in**: Works as a CLI pipe, Python library, or vLLM server — integrates with any agent framework
+## Evaluation
+Evaluated on 617 held-out test samples from SWE-bench repositories, across 14 tool types:
+### Squeez-2B vs. generative models
+| Model | Span P | Span R | Span F1 | Exact Match | Fuzzy F1 | Partial Overlap | Empty Acc | ROUGE-L | Compression |
+|-------|--------|--------|---------|-------------|----------|-----------------|-----------|---------|-------------|
+| **Squeez-2B** | **0.8043** | **0.8624** | **0.7895** | **0.4911** | **0.8035** | **0.9189** | **0.9676** | **0.7208** | 0.9150 |
+| Qwen 3.5 35B A3B (zero-shot) | 0.7402 | 0.7498 | 0.7000 | 0.3922 | 0.7254 | 0.8347 | 0.9157 | 0.7151 | 0.9177 |
+| Qwen 3.5 2B (untrained) | 0.4154 | 0.5299 | 0.4075 | 0.1945 | 0.5482 | 0.7683 | 0.9157 | 0.5481 | 0.8197 |
+### Squeez-2B vs. naive baselines
+| Model | Span P | Span R | Span F1 | Exact Match | Fuzzy F1 | Partial Overlap | Empty Acc | ROUGE-L | Compression |
+|-------|--------|--------|---------|-------------|----------|-----------------|-----------|---------|-------------|
+| **Squeez-2B** | **0.8043** | **0.8624** | **0.7895** | **0.4911** | **0.8035** | **0.9189** | **0.9676** | **0.7208** | 0.9150 |
+| BM25 (10%) | 0.1277 | 0.2172 | 0.1314 | 0.0146 | 0.2314 | 0.5883 | 0.8981 | 0.2073 | 0.9036 |
+| First-N (10%) | 0.0741 | 0.1445 | 0.0798 | 0.0194 | 0.1570 | 0.4389 | 0.9175 | 0.1370 | 0.9055 |
+| Random (10%) | 0.0738 | 0.1009 | 0.0697 | 0.0113 | 0.1966 | 0.4984 | 0.9061 | 0.1397 | 0.9067 |
+| Last-N (10%) | 0.0496 | 0.0503 | 0.0407 | 0.0129 | 0.1393 | 0.3916 | 0.8560 | 0.1173 | 0.9130 |
+### Metric definitions
+- **Span F1**: strict line-level set overlap between predicted and gold relevant lines
+- **Fuzzy F1**: same as Span F1 but with fuzzy substring matching (threshold 0.5)
+- **Partial Overlap**: fraction of gold lines that have any overlap with predictions
+- **Empty Accuracy**: correctly predicting empty vs non-empty output (tool returned nothing relevant)
+- **Compression**: fraction of input removed (higher = more aggressive pruning)
+## Quick Start
+### With vLLM (recommended)
+```bash
+# Start the server
+pip install vllm
+vllm serve KRLabsOrg/squeez-2b --dtype bfloat16 --max-model-len 16384
+# Use from squeez CLI
+pip install squeez
+export SQUEEZ_SERVER_URL=http://localhost:8000/v1
+cat output.txt | squeez "find the bug"
+# Or pipe directly
+python -m pytest tests/ -v 2>&1 | squeez "find the test failure related to authentication"
+```
+vLLM gives you batched inference, continuous batching, and high throughput — ideal when multiple agents or tools are running concurrently.
+### With squeez (local, no server)
+```bash
+pip install squeez
+# Downloads and runs the model locally (no GPU server needed)
+squeez "Find the failing traceback block" --input-file output.txt
+```
+> **Note:** Local mode loads the model on every call. Fine for one-off use, but for repeated calls (e.g. an agent piping every tool through squeez), use vLLM — the model stays warm in memory.
+### With transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_name = "KRLabsOrg/squeez-2b"
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True,
+)
+messages = [
+    {"role": "system", "content": (
+        "You prune verbose tool output for a coding agent. "
+        "Given a focused extraction query and one tool output, return only the "
+        "smallest verbatim evidence block(s) the agent should read next. "
+        "Return the kept text inside <relevant_lines> tags. "
+        "Do not rewrite, summarize, or invent lines."
+    )},
+    {"role": "user", "content": (
+        "<query>\nFix the failing authentication test\n</query>\n"
+        "<tool_output>\n"
+        "PASSED tests/test_login.py::test_valid_credentials\n"
+        "FAILED tests/test_login.py::test_token_refresh - AssertionError: expected 200 got 401\n"
+        "PASSED tests/test_login.py::test_logout\n"
+        "PASSED tests/test_login.py::test_rate_limiting\n"
+        "\n</tool_output>"
+    )},
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+with torch.no_grad():
+    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=True)
+response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
+print(response)
+```
+**Output:**
+```xml
+<relevant_lines>
+FAILED tests/test_login.py::test_token_refresh - AssertionError: expected 200 got 401
+</relevant_lines>
+```
+### Python API (with squeez)
+```python
+from squeez.inference.extractor import ToolOutputExtractor
+# Loads this model locally
+extractor = ToolOutputExtractor(model_path="KRLabsOrg/squeez-2b")
+# Or connect to a vLLM server
+extractor = ToolOutputExtractor(base_url="http://localhost:8000/v1")
+filtered = extractor.extract(
+    task="Find the referer validation block",
+    tool_output=raw_output,
+)
+print(filtered)
+```
+## Input / Output Format
+**Input** — chat format with system prompt:
+```
+System: You prune verbose tool output for a coding agent. Given a focused
+extraction query and one tool output, return only the smallest verbatim
+evidence block(s) the agent should read next. Return the kept text inside
+<relevant_lines> tags. Do not rewrite, summarize, or invent lines.
+User: <query>{task_description}</query>
+<tool_output>{raw_tool_output}</tool_output>
+```
+**Output** — verbatim relevant lines wrapped in XML:
+```xml
+<relevant_lines>
+{only the lines that matter, copied verbatim}
+</relevant_lines>
+```
+If no lines are relevant, the model returns empty tags: `<relevant_lines>\n</relevant_lines>`.
+## Supported Tool Types
+The model was trained on 14 tool types from SWE-bench repositories:
+| Tool type | Description | Example |
+|-----------|-------------|---------|
+| `test_output` | pytest / unittest output | Test failures, tracebacks, assertion errors |
+| `read_file` | File contents | Source code, config files |
+| `grep` | Search results | Pattern matches across files |
+| `git_diff` | Code changes | Diffs between commits or branches |
+| `git_log` | Commit history | Relevant commits |
+| `git_blame` | Line-level attribution | Who changed what |
+| `ls` | Directory listings | File structure |
+| `python` | Python REPL output | Script output, errors |
+| `curl` | HTTP responses | API responses, documentation |
+| `build_output` | Build logs | Compilation errors, warnings |
+| `lint_output` | Linter output | Style/type violations |
+| `pip_install` | Package manager output | Dependency errors |
+| `type_check` | Type checker output | mypy/pyright errors |
+| `coverage` | Coverage reports | Uncovered lines |
+## Training Details
+| Parameter | Value |
+|-----------|-------|
+| Base model | [Qwen/Qwen3.5-2B](https://huggingface.co/Qwen/Qwen3.5-2B) |
+| Fine-tuning method | LoRA (r=16, alpha=32) via [Unsloth](https://github.com/unslothai/unsloth) |
+| Training data | Squeez v3 — 10,508 samples from [SWE-bench](https://swe-bench.github.io/) |
+| Epochs | 3 (best checkpoint at epoch 1.5) |
+| Max sequence length | 16,384 tokens |
+| Learning rate | 2e-4 |
+| Batch size | 8 (effective 32 with 4x gradient accumulation) |
+| Warmup | 5% of steps |
+| Weight decay | 0.01 |
+| Checkpoint selection | Best validation Span F1 |
+### Data generation
+Training data was generated by running 14 types of tool calls on SWE-bench repositories and using a teacher model to label the relevant lines. Each sample contains:
+- A focused extraction query (what the agent needs to find)
+- Raw tool output (as the agent would see it)
+- Gold relevant lines (the minimal set the agent should read)
+Dataset: [KRLabsOrg/tool-output-extraction-swebench](https://huggingface.co/datasets/KRLabsOrg/tool-output-extraction-swebench)
+## Limitations
+- Trained primarily on Python/SWE-bench data — works best on software engineering tool output, though the prompt format generalizes to other domains
+- Not designed for general-purpose text summarization or question answering
+- Very short outputs (<5 lines) may be returned unchanged
+- Max input length is 16,384 tokens — longer outputs should be chunked
+## Use with coding agents
+Add to your agent's system instructions (e.g. `CLAUDE.md` for Claude Code):
+```
+Always pipe shell commands through squeez and tell exactly what you want to know.
+Examples:
+- `bun test 2>&1 | squeez "did the tests pass?"`
+- `git log --oneline -50 | squeez "find the commit that broke CSRF"`
+- `cat src/auth/middleware.py | squeez "find the referer validation logic"`
+Do NOT use squeez when:
+- You need exact, uncompressed output (e.g. writing a patch)
+- The command is interactive
+```
+## Citation
+```bibtex
+@software{kovacs2026squeez,
+    title={Squeez: Compressing Tool Output for LLM Coding Agents},
+    author={Adam Kovacs},
+    year={2026},
+    url={https://github.com/KRLabsOrg/squeez}
+}
+```
+## License
+Apache 2.0
+## Acknowledgments
+- [Qwen](https://huggingface.co/Qwen) for the Qwen 3.5 2B base model
+- [Unsloth](https://github.com/unslothai/unsloth) for efficient LoRA training
+- [SWE-bench](https://swe-bench.github.io/) for the evaluation framework and source repositories
+- [Provence](https://arxiv.org/abs/2501.16214) and [SWE-Pruner](https://github.com/ayanami-kitasan/SWE-Pruner) for inspiration on context pruning approaches

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,154 @@

+{%- set image_count = namespace(value=0) %}
+{%- set video_count = namespace(value=0) %}
+{%- macro render_content(content, do_vision_count, is_system_content=false) %}
+    {%- if content is string %}
+        {{- content }}
+    {%- elif content is iterable and content is not mapping %}
+        {%- for item in content %}
+            {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain images.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set image_count.value = image_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Picture ' ~ image_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
+            {%- elif 'video' in item or item.type == 'video' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain videos.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set video_count.value = video_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Video ' ~ video_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
+            {%- elif 'text' in item %}
+                {{- item.text }}
+            {%- else %}
+                {{- raise_exception('Unexpected item type in content.') }}
+            {%- endif %}
+        {%- endfor %}
+    {%- elif content is none or content is undefined %}
+        {{- '' }}
+    {%- else %}
+        {{- raise_exception('Unexpected content type.') }}
+    {%- endif %}
+{%- endmacro %}
+{%- if not messages %}
+    {{- raise_exception('No messages provided.') }}
+{%- endif %}
+{%- if tools and tools is iterable and tools is not mapping %}
+    {{- '<|im_start|>system\n' }}
+    {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>" }}
+    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {%- if content %}
+            {{- '\n\n' + content }}
+        {%- endif %}
+    {%- endif %}
+    {{- '<|im_end|>\n' }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" %}
+        {%- set content = render_content(message.content, false)|trim %}
+        {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
+            {%- set ns.multi_step_tool = false %}
+            {%- set ns.last_query_index = index %}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if ns.multi_step_tool %}
+    {{- raise_exception('No user query found in messages.') }}
+{%- endif %}
+{%- for message in messages %}
+    {%- set content = render_content(message.content, true)|trim %}
+    {%- if message.role == "system" %}
+        {%- if not loop.first %}
+            {{- raise_exception('System message must be at the beginning.') }}
+        {%- endif %}
+    {%- elif message.role == "user" %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- set reasoning_content = reasoning_content|trim %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if tool_call.function is defined %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {%- if loop.first %}
+                    {%- if content|trim %}
+                        {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- else %}
+                        {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- endif %}
+                {%- else %}
+                    {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                {%- endif %}
+                {%- if tool_call.arguments is defined %}
+                    {%- for args_name, args_value in tool_call.arguments|items %}
+                        {{- '<parameter=' + args_name + '>\n' }}
+                        {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
+                        {{- args_value }}
+                        {{- '\n</parameter>\n' }}
+                    {%- endfor %}
+                {%- endif %}
+                {{- '</function>\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.previtem and loop.previtem.role != "tool" %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if not loop.last and loop.nextitem.role != "tool" %}
+            {{- '<|im_end|>\n' }}
+        {%- elif loop.last %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- else %}
+        {{- raise_exception('Unexpected message role.') }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is true %}
+        {{- '<think>\n' }}
+    {%- else %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

config.json ADDED Viewed

	@@ -0,0 +1,103 @@

+{
+    "architectures": [
+        "Qwen3_5ForConditionalGeneration"
+    ],
+    "torch_dtype": "bfloat16",
+    "image_token_id": 248056,
+    "model_name": "Qwen/Qwen3.5-2B",
+    "model_type": "qwen3_5",
+    "pad_token_id": 248044,
+    "text_config": {
+        "attention_bias": false,
+        "attention_dropout": 0.0,
+        "attn_output_gate": true,
+        "bos_token_id": null,
+        "torch_dtype": "bfloat16",
+        "eos_token_id": 248044,
+        "full_attention_interval": 4,
+        "head_dim": 256,
+        "hidden_act": "silu",
+        "hidden_size": 2048,
+        "initializer_range": 0.02,
+        "intermediate_size": 6144,
+        "layer_types": [
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention",
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention",
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention",
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention",
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention",
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention"
+        ],
+        "linear_conv_kernel_dim": 4,
+        "linear_key_head_dim": 128,
+        "linear_num_key_heads": 16,
+        "linear_num_value_heads": 16,
+        "linear_value_head_dim": 128,
+        "mamba_ssm_dtype": "float32",
+        "max_position_embeddings": 262144,
+        "mlp_only_layers": [],
+        "model_type": "qwen3_5_text",
+        "mtp_num_hidden_layers": 1,
+        "mtp_use_dedicated_embeddings": false,
+        "num_attention_heads": 8,
+        "num_hidden_layers": 24,
+        "num_key_value_heads": 2,
+        "pad_token_id": null,
+        "partial_rotary_factor": 0.25,
+        "rms_norm_eps": 1e-06,
+        "rope_parameters": {
+            "mrope_interleaved": true,
+            "mrope_section": [
+                11,
+                11,
+                10
+            ],
+            "partial_rotary_factor": 0.25,
+            "rope_theta": 10000000,
+            "rope_type": "default"
+        },
+        "tie_word_embeddings": true,
+        "use_cache": true,
+        "vocab_size": 248320
+    },
+    "tie_word_embeddings": true,
+    "unsloth_version": "2026.3.4",
+    "video_token_id": 248057,
+    "vision_config": {
+        "deepstack_visual_indexes": [],
+        "depth": 24,
+        "torch_dtype": "bfloat16",
+        "hidden_act": "gelu_pytorch_tanh",
+        "hidden_size": 1024,
+        "in_channels": 3,
+        "initializer_range": 0.02,
+        "intermediate_size": 4096,
+        "model_type": "qwen3_5",
+        "num_heads": 16,
+        "num_position_embeddings": 2304,
+        "out_hidden_size": 2048,
+        "patch_size": 16,
+        "spatial_merge_size": 2,
+        "temporal_patch_size": 2
+    },
+    "vision_end_token_id": 248054,
+    "vision_start_token_id": 248053
+}

model.safetensors-00001-of-00001.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:390a6b651c6ee0cb7e1d10636056cba492154b52d8204a8fa0f05e904930c0bf
+size 4548221488

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,639 @@

+{
+  "metadata": {
+    "total_size": 4548144832
+  },
+  "weight_map": {
+    "model.language_model.embed_tokens.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.merger.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.23.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.23.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.23.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.3.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.3.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.3.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.7.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.7.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.7.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.layers.0.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.layers.0.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.layers.0.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.11.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.11.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.11.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.15.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.15.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.15.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.19.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.19.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.19.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.linear_attn.in_proj_qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.mlp.down_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.mlp.gate_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.mlp.up_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.23.self_attn.q_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.7.self_attn.q_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.fc.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.layers.0.self_attn.q_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.11.self_attn.q_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.3.self_attn.q_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.15.self_attn.q_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.19.self_attn.q_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.merger.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.23.self_attn.o_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.3.self_attn.o_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.7.self_attn.o_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.layers.0.self_attn.o_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.11.self_attn.o_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.15.self_attn.o_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.19.self_attn.o_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.linear_attn.in_proj_z.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.linear_attn.out_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.0.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.0.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.1.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.1.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.10.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.10.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.11.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.11.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.12.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.12.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.13.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.13.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.14.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.14.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.15.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.15.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.16.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.16.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.17.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.17.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.18.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.18.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.19.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.19.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.2.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.2.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.20.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.20.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.21.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.21.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.22.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.22.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.23.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.23.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.3.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.3.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.4.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.4.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.5.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.5.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.6.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.6.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.7.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.7.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.8.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.8.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.9.mlp.linear_fc1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.9.mlp.linear_fc2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.0.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.1.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.10.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.11.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.12.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.13.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.14.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.15.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.16.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.17.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.18.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.19.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.2.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.20.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.21.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.22.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.23.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.3.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.4.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.5.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.6.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.7.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.8.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.9.attn.qkv.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.pos_embed.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.patch_embed.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.23.self_attn.k_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.23.self_attn.v_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.7.self_attn.k_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.7.self_attn.v_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.layers.0.self_attn.k_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.layers.0.self_attn.v_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.11.self_attn.k_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.11.self_attn.v_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.3.self_attn.k_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.3.self_attn.v_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.15.self_attn.k_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.15.self_attn.v_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.19.self_attn.k_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.19.self_attn.v_proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.0.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.1.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.10.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.11.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.12.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.13.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.14.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.15.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.16.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.17.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.18.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.19.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.2.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.20.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.21.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.22.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.23.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.3.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.4.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.5.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.6.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.7.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.8.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.9.attn.proj.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.linear_attn.in_proj_b.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.linear_attn.in_proj_a.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.linear_attn.conv1d.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.0.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.1.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.10.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.11.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.12.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.13.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.14.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.15.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.16.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.17.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.18.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.19.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.2.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.20.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.21.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.22.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.23.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.3.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.4.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.5.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.6.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.7.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.8.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.9.mlp.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.merger.linear_fc1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.0.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.1.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.10.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.11.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.12.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.13.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.14.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.15.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.16.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.17.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.18.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.19.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.2.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.20.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.21.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.22.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.23.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.3.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.4.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.5.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.6.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.7.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.8.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.9.attn.qkv.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.23.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.23.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.3.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.7.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.7.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.layers.0.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.layers.0.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.pre_fc_norm_embedding.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.pre_fc_norm_hidden.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.11.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.11.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.3.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.15.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.15.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.19.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.19.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.post_attention_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.input_layernorm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.merger.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.0.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.0.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.0.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.0.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.0.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.0.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.1.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.1.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.1.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.1.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.1.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.1.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.10.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.10.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.10.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.10.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.10.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.10.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.11.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.11.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.11.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.11.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.11.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.11.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.12.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.12.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.12.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.12.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.12.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.12.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.13.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.13.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.13.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.13.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.13.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.13.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.14.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.14.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.14.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.14.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.14.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.14.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.15.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.15.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.15.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.15.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.15.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.15.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.16.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.16.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.16.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.16.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.16.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.16.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.17.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.17.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.17.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.17.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.17.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.17.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.18.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.18.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.18.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.18.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.18.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.18.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.19.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.19.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.19.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.19.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.19.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.19.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.2.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.2.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.2.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.2.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.2.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.2.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.20.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.20.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.20.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.20.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.20.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.20.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.21.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.21.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.21.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.21.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.21.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.21.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.22.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.22.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.22.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.22.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.22.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.22.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.23.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.23.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.23.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.23.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.23.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.23.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.3.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.3.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.3.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.3.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.3.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.3.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.4.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.4.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.4.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.4.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.4.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.4.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.5.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.5.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.5.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.5.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.5.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.5.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.6.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.6.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.6.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.6.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.6.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.6.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.7.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.7.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.7.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.7.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.7.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.7.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.8.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.8.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.8.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.8.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.8.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.8.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.9.attn.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.9.mlp.linear_fc2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.9.norm1.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.9.norm1.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.9.norm2.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.blocks.9.norm2.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.merger.norm.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.merger.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.visual.patch_embed.proj.bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.23.self_attn.k_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.23.self_attn.q_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.3.self_attn.k_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.7.self_attn.k_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.7.self_attn.q_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.layers.0.self_attn.k_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "mtp.layers.0.self_attn.q_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.11.self_attn.k_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.11.self_attn.q_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.3.self_attn.q_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.15.self_attn.k_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.15.self_attn.q_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.19.self_attn.k_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.19.self_attn.q_norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.linear_attn.norm.weight": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.linear_attn.A_log": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.8.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.12.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.4.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.20.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.21.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.18.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.13.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.14.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.2.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.10.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.16.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.17.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.5.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.6.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.22.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.9.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.0.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors",
+    "model.language_model.layers.1.linear_attn.dt_bias": "model.safetensors-00001-of-00001.safetensors"
+  }
+}

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+    "size": {
+        "longest_edge": 16777216,
+        "shortest_edge": 65536
+    },
+    "patch_size": 16,
+    "temporal_patch_size": 2,
+    "merge_size": 2,
+    "image_mean": [
+        0.5,
+        0.5,
+        0.5
+    ],
+    "image_std": [
+        0.5,
+        0.5,
+        0.5
+    ],
+    "processor_class": "Qwen3VLProcessor",
+    "image_processor_type": "Qwen2VLImageProcessorFast"
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
+size 19989343

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "add_prefix_space": false,
+  "audio_bos_token": "<|audio_start|>",
+  "audio_eos_token": "<|audio_end|>",
+  "audio_token": "<|audio_pad|>",
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "image_token": "<|image_pad|>",
+  "is_local": false,
+  "model_max_length": 262144,
+  "model_specific_special_tokens": {
+    "audio_bos_token": "<|audio_start|>",
+    "audio_eos_token": "<|audio_end|>",
+    "audio_token": "<|audio_pad|>",
+    "image_token": "<|image_pad|>",
+    "video_token": "<|video_pad|>",
+    "vision_bos_token": "<|vision_start|>",
+    "vision_eos_token": "<|vision_end|>"
+  },
+  "pad_token": "<|endoftext|>",
+  "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
+  "split_special_tokens": false,
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": null,
+  "video_token": "<|video_pad|>",
+  "vision_bos_token": "<|vision_start|>",
+  "vision_eos_token": "<|vision_end|>",
+  "chat_template": "{%- set image_count = namespace(value=0) %}\n{%- set video_count = namespace(value=0) %}\n{%- macro render_content(content, do_vision_count, is_system_content=false) %}\n    {%- if content is string %}\n        {{- content }}\n    {%- elif content is iterable and content is not mapping %}\n        {%- for item in content %}\n            {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}\n                {%- if is_system_content %}\n                    {{- raise_exception('System message cannot contain images.') }}\n                {%- endif %}\n                {%- if do_vision_count %}\n                    {%- set image_count.value = image_count.value + 1 %}\n                {%- endif %}\n                {%- if add_vision_id %}\n                    {{- 'Picture ' ~ image_count.value ~ ': ' }}\n                {%- endif %}\n                {{- '<|vision_start|><|image_pad|><|vision_end|>' }}\n            {%- elif 'video' in item or item.type == 'video' %}\n                {%- if is_system_content %}\n                    {{- raise_exception('System message cannot contain videos.') }}\n                {%- endif %}\n                {%- if do_vision_count %}\n                    {%- set video_count.value = video_count.value + 1 %}\n                {%- endif %}\n                {%- if add_vision_id %}\n                    {{- 'Video ' ~ video_count.value ~ ': ' }}\n                {%- endif %}\n                {{- '<|vision_start|><|video_pad|><|vision_end|>' }}\n            {%- elif 'text' in item %}\n                {{- item.text }}\n            {%- else %}\n                {{- raise_exception('Unexpected item type in content.') }}\n            {%- endif %}\n        {%- endfor %}\n    {%- elif content is none or content is undefined %}\n        {{- '' }}\n    {%- else %}\n        {{- raise_exception('Unexpected content type.') }}\n    {%- endif %}\n{%- endmacro %}\n{%- if not messages %}\n    {{- raise_exception('No messages provided.') }}\n{%- endif %}\n{%- if tools and tools is iterable and tools is not mapping %}\n    {{- '<|im_start|>system\\n' }}\n    {{- \"# Tools\\n\\nYou have access to the following functions:\\n\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\" }}\n    {{- '\\n\\nIf you choose to call a function ONLY reply in the following format with NO suffix:\\n\\n<tool_call>\\n<function=example_function_name>\\n<parameter=example_parameter_1>\\nvalue_1\\n</parameter>\\n<parameter=example_parameter_2>\\nThis is the value for the second parameter\\nthat can span\\nmultiple lines\\n</parameter>\\n</function>\\n</tool_call>\\n\\n<IMPORTANT>\\nReminder:\\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\\n- Required parameters MUST be specified\\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\\n</IMPORTANT>' }}\n    {%- if messages[0].role == 'system' %}\n        {%- set content = render_content(messages[0].content, false, true)|trim %}\n        {%- if content %}\n            {{- '\\n\\n' + content }}\n        {%- endif %}\n    {%- endif %}\n    {{- '<|im_end|>\\n' }}\n{%- else %}\n    {%- if messages[0].role == 'system' %}\n        {%- set content = render_content(messages[0].content, false, true)|trim %}\n        {{- '<|im_start|>system\\n' + content + '<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n    {%- set index = (messages|length - 1) - loop.index0 %}\n    {%- if ns.multi_step_tool and message.role == \"user\" %}\n        {%- set content = render_content(message.content, false)|trim %}\n        {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}\n            {%- set ns.multi_step_tool = false %}\n            {%- set ns.last_query_index = index %}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if ns.multi_step_tool %}\n    {{- raise_exception('No user query found in messages.') }}\n{%- endif %}\n{%- for message in messages %}\n    {%- set content = render_content(message.content, true)|trim %}\n    {%- if message.role == \"system\" %}\n        {%- if not loop.first %}\n            {{- raise_exception('System message must be at the beginning.') }}\n        {%- endif %}\n    {%- elif message.role == \"user\" %}\n        {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {%- set reasoning_content = '' %}\n        {%- if message.reasoning_content is string %}\n            {%- set reasoning_content = message.reasoning_content %}\n        {%- else %}\n            {%- if '</think>' in content %}\n                {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n                {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n            {%- endif %}\n        {%- endif %}\n        {%- set reasoning_content = reasoning_content|trim %}\n        {%- if loop.index0 > ns.last_query_index %}\n            {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content + '\\n</think>\\n\\n' + content }}\n        {%- else %}\n            {{- '<|im_start|>' + message.role + '\\n' + content }}\n        {%- endif %}\n        {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}\n            {%- for tool_call in message.tool_calls %}\n                {%- if tool_call.function is defined %}\n                    {%- set tool_call = tool_call.function %}\n                {%- endif %}\n                {%- if loop.first %}\n                    {%- if content|trim %}\n                        {{- '\\n\\n<tool_call>\\n<function=' + tool_call.name + '>\\n' }}\n                    {%- else %}\n                        {{- '<tool_call>\\n<function=' + tool_call.name + '>\\n' }}\n                    {%- endif %}\n                {%- else %}\n                    {{- '\\n<tool_call>\\n<function=' + tool_call.name + '>\\n' }}\n                {%- endif %}\n                {%- if tool_call.arguments is defined %}\n                    {%- for args_name, args_value in tool_call.arguments|items %}\n                        {{- '<parameter=' + args_name + '>\\n' }}\n                        {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}\n                        {{- args_value }}\n                        {{- '\\n</parameter>\\n' }}\n                    {%- endfor %}\n                {%- endif %}\n                {{- '</function>\\n</tool_call>' }}\n            {%- endfor %}\n        {%- endif %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if loop.previtem and loop.previtem.role != \"tool\" %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- content }}\n        {{- '\\n</tool_response>' }}\n        {%- if not loop.last and loop.nextitem.role != \"tool\" %}\n            {{- '<|im_end|>\\n' }}\n        {%- elif loop.last %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- else %}\n        {{- raise_exception('Unexpected message role.') }}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n    {%- if enable_thinking is defined and enable_thinking is true %}\n        {{- '<think>\\n' }}\n    {%- else %}\n        {{- '<think>\\n\\n</think>\\n\\n' }}\n    {%- endif %}\n{%- endif %}"
+}