Duplicate from Jackrong/Qwopus3.5-4B-v3

Browse files

Co-authored-by: Jackrong <Jackrong@users.noreply.huggingface.co>

Files changed (10) hide show

.gitattributes +36 -0
README.md +234 -0
chat_template.jinja +88 -0
config.json +114 -0
model.safetensors-00001-of-00002.safetensors +3 -0
model.safetensors-00002-of-00002.safetensors +3 -0
model.safetensors.index.json +745 -0
processor_config.json +63 -0
tokenizer.json +3 -0
tokenizer_config.json +34 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,36 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,234 @@

+---
+language:
+- en
+- zh
+- ko
+license: apache-2.0
+base_model: unsloth/Qwen3.5-4B
+tags:
+- unsloth
+- qwen
+- qwen3.5
+- reasoning
+- chain-of-thought
+- lora
+- competitive-programming
+pipeline_tag: image-text-to-text
+---
+# 🌟 Qwopus3.5-4B-v3
+🔥 **Update (April 5):** I’ve released the complete training notebook, codebase, and a comprehensive PDF guide to help beginners and enthusiasts understand and reproduce this model's fine-tuning process.
+> ❤️ Special thanks to the [**Unsloth**](https://unsloth.ai) open-source library and [@KyleHessling1](https://x.com/kylehessling1) for their support.
+## 📚 Resources & Guides
+👉 **[GitHub Repository: Jackrong-llm-finetuning-guide](https://github.com/R6410418/Jackrong-llm-finetuning-guide.git)**
+Visit the repo to dive into the codebase and reproduce the results locally or on Colab.
+### 📥 Core Technical Document
+**🔗 [Qwopus3.5-27b Complete Fine-Tuning Guide (PDF)](https://github.com/R6410418/Jackrong-llm-finetuning-guide/blob/main/guidePDF/Qwopus3-5-27b-Colab_complete_guide_to_llm_finetuning.pdf)**
+* **The Full Pipeline:** A step-by-step walkthrough—from downloading the base model and unifying heterogeneous data, to configuring trainer hyperparameters and publishing to Hugging Face.
+* **Beginner Friendly:** Includes an introductory guide to getting started with Google Colab and Unsloth.
+* *Feedback welcome! If you spot any areas for improvement, please let me know and I will update it promptly.*
+> **A Note:**
+> My goal isn't just to detail a workflow, but to demystify LLM training. Beyond the social media hype, fine-tuning isn't an unattainable ritual—often, all you need is a Google account, a standard laptop, and relentless curiosity.
+>
+> *No one starts as an expert, but every expert was once brave enough to begin.*
+>
+> All training and testing for this project were self-funded. If you find this model or guide helpful, a **Star ⭐️ on GitHub** would be the greatest encouragement. Thank you! 🙏
+> [!Note]
+> The Claude series model optimizations are named under the **Qwopus3.5 series**, with the latest version being **🌟Qwopus3.5-v3**.
+---
+## 🎯 Motivation
+Recent advances in language agents have predominantly focused on improving reasoning accuracy through Chain-of-Thought (CoT) and self-reflection mechanisms, encouraging models to iteratively refine their reasoning before taking actions.
+However, emerging evidence suggests that such **"pre-action overthinking"** is not always optimal for sequential decision-making. Instead, agent performance can be more effectively improved through a **trial-and-error paradigm**, where actions are executed early and refined based on environmental feedback.
+### 🔬 Supporting Evidence
+- **Reflexion**[^1] demonstrates that agents can significantly improve decision-making by leveraging trial, error, and self-reflection — shifting the role of reflection from *pre-action deliberation* to **post-action correction**, enabling agents to learn from concrete execution outcomes rather than speculative reasoning.
+- **Post-failure reflection + retry**[^2] substantially boosts performance:
+  - 📈 **+34.7%** on mathematical reasoning tasks
+  - 📈 **+18.1%** on function calling tasks
+  This provides strong empirical evidence that **reflection is most effective when grounded in execution outcomes**, rather than purely internal reasoning.
+### 🧭 My Approach
+For multi-step and tool-augmented agent systems, performance should not be optimized solely through deeper pre-execution reasoning. A more effective strategy is an **execution-driven optimization loop** — where agents perform lightweight initial reasoning, act in the environment, and iteratively refine their behavior based on feedback signals.
+> **Paradigm Shift:** from **"reason-then-act"** → **"act-then-refine"**
+>
+> The objective is not to achieve optimal reasoning in a single pass, but to enable robust task completion through iterative interaction and correction.
+---
+## 💡 Model Introduction
+**Qwopus3.5-4B-v3** is a reasoning-enhanced model based on **Qwen3.5-4B**, designed to simultaneously improve reasoning stability and correctness while optimizing inference efficiency — ultimately achieving stronger cross-task generalization capabilities, particularly in programming.
+**Key Highlights:**
+- 🧩 **Structural Reasoning Optimization** — Refines the fundamental structure of the reasoning process through high-quality reasoning distillation and structural alignment, enabling higher accuracy rates via shorter, more stable reasoning paths.
+- 🔧 **Tool-Calling Reinforcement** — Incorporates specialized RL training for tool-calling, optimized for tool-augmented agent frameworks like **OpenClaw**, strengthening stability in continuous task execution and proficiency in tool invocation.
+- 🔁 **Act-Then-Refine Paradigm** — Designed for complex, multi-step agentic workflows, aligning with the core motivation of replacing pre-action deliberation with execution-driven refinement.
+---
+## 🔗 Chain-of-Thought Optimization
+### 🚧 The Problem with v2 Distillation
+The v2 model was primarily trained through SFT on CoT data distilled from strong teacher models such as Claude. While this can transfer high‑quality reasoning patterns, CoT traces from third‑party datasets do not always faithfully reflect a model’s true internal reasoning process — and after analysis, I found some portions may even be **“fabricated”**, meaning the traces were not actually generated by the claimed teacher model.[^3][^4]
+Prior work further shows that CoT explanations can act as **post-hoc rationalizations** rather than genuine step-by-step reasoning[^3]. As a result, student models risk learning:
+- Surface-level **pattern matching** instead of underlying reasoning
+- **Answer memorization** rather than generalizable problem-solving
+- Reduced robustness on out-of-distribution tasks
+### ✅ What v3 Does Differently
+| | v2 (Distillation) | v3 (Structural Alignment) |
+|---|---|---|
+| **CoT Source** | Third-party distilled traces | Curated, verifiable reasoning chains |
+| **Learning Target** | Imitate teacher outputs | Learn process-level reasoning |
+| **Reasoning Style** | Compressed, potentially fabricated | Explicit, step-by-step, faithful |
+| **Robustness** | Lower on unseen tasks | Higher generalization |
+v3 focuses on improving the **faithfulness, completeness, and structural clarity** of reasoning traces. Instead of imitating compressed teacher CoT, the model is trained to produce more explicit and verifiable intermediate steps — enabling a transition from **“answer imitation”** to **process-level reasoning learning**.
+This improves both the interpretability and reliability of the reasoning process, providing a more stable foundation for downstream multi-step and agent-based tasks.
+> ⚠️ **Side Effect:** The generated CoT length in v3 will be **significantly longer** than v2, as a direct consequence of more explicit intermediate reasoning.
+---
+### 🍎 Qwopus3.5-4B-v3: Humaneval Benchmark Evaluation
+> 🔬 **Inference Setup:** All models were evaluated under the **Unsloth** runtime using **bfloat16 (BF16)** precision — optimally balanced for numerical range and memory efficiency at 4B scale. Answer verification, partial CoT adjudication, and statistical analysis were cross-validated by **GPT-4.5-Pro (Thinking)** and **Claude Opus 4.6 (Thinking)** to ensure reproducibility.
+### 📊 HumanEval — 164-Task Full Benchmark
+Three 4B-scale Qwen-family models were evaluated under a **conservative manual adjudication protocol**, addressing:
+- 🧹 Code-extraction pollution
+- ✂️ Answer / code separation issues
+- 🗂️ Formatting noise in otherwise correct outputs
+> 🏆 **Result:** Under this fair and strict evaluation setting, **Qwopus3.5-4B-v3 achieves the best strict overall score of 75.61% (124/164)** — outperforming **Qwen3.5-4B** (72.56%, 119/164) and **Claude-Distilled-v2** (69.51%, 114/164), while simultaneously reducing the number of manual rescues required.
+| Model | Base Pass | Plus Pass | vs. Qwen3.5-4B |
+| :--- | :---: | :---: | :---: |
+| 🥇 **Qwopus3.5-4B-v3** | **77.44%** (127/164) | **75.61%** (124/164) | 📈 **+3.05 pp** |
+| Qwen3.5-4B | 76.83% (126/164) | 72.56% (119/164) | — Baseline — |
+| Claude-Distilled-v2 | 73.17% (120/164) | 69.51% (114/164) | 📉 −3.05 pp |
+---
+## 🗺️ Training Pipeline Overview
+```text
+Base Model (Qwen3.5-4B)
+ │
+ ▼
+Qwen3.5-4B fine-tuned with Unsloth
+ │
+ ▼
+Supervised Fine-Tuning (SFT) + LoRA
+(Response-Only Training masked on "<|im_start|>assistant\n<think>")
+ │
+ ▼
+Qwopus3.5-4B-v3
+```
+### 🧠 Example of Learned Reasoning Scaffold
+The model includes targeted optimizations addressing Qwen3.5's tendency toward excessive or repetitive reasoning on simple queries. By distilling the structured reasoning habits of top-tier models like Claude Opus, Qwopus3.5-4B-v3 adopts a highly organized, step-by-step cognitive layout.
+```text
+Example：The user is asking about [Topic] and how it differs from [Topic B]. This is a [Task type] question. Let me break this down:
+1. What is [Topic A]?
+   - [Fact/Mechanism 1]
+   - [Fact/Mechanism 2]
+2. What is [Topic B]?
+   - [Fact/Mechanism 1]
+3. Key differences:
+   - [Comparison Point 1]
+   - [Comparison Point 2]
+Let me make sure to be accurate: [...]
+Actually, I should double-check: is [Fact] used before [Fact]? Yes, typically...
+Let me provide a clear, well-structured answer:
+```
+### 📚 Training Data
+The model was fine-tuned on a **high-fidelity reasoning dataset**, which was meticulously curated from a blend of premium open-source sources on Hugging Face. This dataset is the result of a rigorous **mixing and cleaning process**, specifically designed to filter out low-quality responses and ensure consistently strong logical performance across diverse analytical domains.
+> *(Rest assured, the entire process is strictly by-the-book and 100% compliant with all terms and open-source licenses!)*
+## ⚠️ Limitations & Intended Use
+- **Hallucination Risk:** While reasoning is strong, the model remains an autoregressive LLM; external facts provided during the thinking sequence may occasionally contain hallucinations if verifying real-world events.
+- **Intended Scenario:** Best suited for offline analytical tasks, coding, math, and heavy logic-dependent prompting where the user needs to transparently follow the AI's internal logic.
+- This model is a test version intended solely for learning and demonstration purposes, and is for academic research and technical exploration use only.
+- **Developer Disclaimer:** This is an independent, personal project. Since the developer lacks the specialized technical resources and infrastructure of a large-scale industrial lab, the model's reasoning chain (CoT) may occasionally exhibit instability, logic loops, or reasoning drift. Users are advised to use this model with these experimental limitations in mind.
+> **Note:** The test results presented here differ from the scores on the 4B-v2 model card because the context length was increased for this evaluation. Consequently, the number of tasks affected by context window truncation has changed for each model, leading to different final scores. Please ensure comparisons are made under the same variable settings.
+All post-evaluation standard result files will be uploaded to this repository for transparency and reproducibility. These include:
+- `Jackrong_Qwen3.5-4B-Claude-4.6-Opus-Reasoning-Distilled-v2_humaneval_all_evalonly_eval_results`
+- `Jackrong_Qwopus3.5-4B-v3-test1_humaneval_all_evalonly_eval_results`
+- `qwen_Qwen3.5-4B_humaneval_all_evalonly_eval_results`
+⚠️ **Note on evaluation artifacts.**
+The released result files are based on **raw model generations**, which may contain formatting issues (e.g., Markdown wrappers, answer/code mixing), truncation, or minor token-level corruption. As an independent project operating under limited resources, the evaluation scope here is intentionally focused rather than exhaustive — a comprehensive, multi-domain assessment comparable to large institutional releases was not feasible. Capabilities beyond those benchmarked remain unverified, and users are encouraged to evaluate suitability against their own task requirements before adoption.
+## 🙏 Acknowledgements
+Significant thanks to the [Unsloth AI](https://unsloth.ai/) team for making rapid fine-tuning of large LLM models accessible. Additionally, we acknowledge Qwen internally, and the open-source community developers producing exceptional distilled datasets.
+This qwen3_5 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
+[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
+## References
+[^1]: Shinn, N., Cassano, F., Berman, E., Gopinath, A., Narasimhan, K., & Yao, S. (2023).
+*Reflexion: Language Agents with Verbal Reinforcement Learning*.
+arXiv:2303.11366.
+[^2]: Bensal, S., Jamil, U., Bryant, C., Russak, M., Kamble, K., Mozolevskyi, D., Ali, M., & AlShikh, W. (2025).
+*Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning*.
+arXiv:2505.24726.  https://arxiv.org/abs/2505.24726
+[^3]: Anthropic (2025). *Reasoning Models Don't Always Say What They Think*.
+https://www.anthropic.com/research/reasoning-models-dont-say-think
+[^4]: Lyu et al. (2023). *Faithful Chain-of-Thought Reasoning*. ACL.
+https://aclanthology.org/2023.ijcnlp-main.20/
+## 📖 Citation
+If you use this model in your research or projects, please cite:
+```bibtex
+@misc{jackrong_qwen35_4b_v3
+  title        = {Jackrong/Qwopus3.5-4B-v3},
+  author       = {Jackrong},
+  year         = {2026},
+  publisher    = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/Jackrong/Qwopus3.5-4B-v3}}
+}
+```

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,88 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {{- messages[0].content + '\n\n' }}
+    {%- endif %}
+    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
+        {%- set ns.multi_step_tool = false %}
+        {%- set ns.last_query_index = index %}
+    {%- endif %}
+{%- endfor %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {%- if loop.last or (not loop.last and reasoning_content) %}
+                {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
+            {%- else %}
+                {{- '<|im_start|>' + message.role + '\n' + content }}
+            {%- endif %}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant
+<think>
+' }}
+{%- endif %}

config.json ADDED Viewed

	@@ -0,0 +1,114 @@

+{
+    "architectures": [
+        "Qwen3_5ForConditionalGeneration"
+    ],
+    "torch_dtype": "bfloat16",
+    "eos_token_id": 248046,
+    "image_token_id": 248056,
+    "model_name": "unsloth/Qwen3.5-4B",
+    "model_type": "qwen3_5",
+    "pad_token_id": 248055,
+    "text_config": {
+        "attention_bias": false,
+        "attention_dropout": 0.0,
+        "attn_output_gate": true,
+        "bos_token_id": null,
+        "torch_dtype": "bfloat16",
+        "eos_token_id": 248044,
+        "full_attention_interval": 4,
+        "head_dim": 256,
+        "hidden_act": "silu",
+        "hidden_size": 2560,
+        "initializer_range": 0.02,
+        "intermediate_size": 9216,
+        "layer_types": [
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention",
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention",
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention",
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention",
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention",
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention",
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention",
+            "linear_attention",
+            "linear_attention",
+            "linear_attention",
+            "full_attention"
+        ],
+        "linear_conv_kernel_dim": 4,
+        "linear_key_head_dim": 128,
+        "linear_num_key_heads": 16,
+        "linear_num_value_heads": 32,
+        "linear_value_head_dim": 128,
+        "mamba_ssm_dtype": "float32",
+        "max_position_embeddings": 262144,
+        "mlp_only_layers": [],
+        "model_type": "qwen3_5_text",
+        "mtp_num_hidden_layers": 1,
+        "mtp_use_dedicated_embeddings": false,
+        "num_attention_heads": 16,
+        "num_hidden_layers": 32,
+        "num_key_value_heads": 4,
+        "pad_token_id": null,
+        "partial_rotary_factor": 0.25,
+        "rms_norm_eps": 1e-06,
+        "rope_parameters": {
+            "mrope_interleaved": true,
+            "mrope_section": [
+                11,
+                11,
+                10
+            ],
+            "partial_rotary_factor": 0.25,
+            "rope_theta": 10000000,
+            "rope_type": "default"
+        },
+        "tie_word_embeddings": true,
+        "use_cache": true,
+        "vocab_size": 248320
+    },
+    "tie_word_embeddings": true,
+    "unsloth_fixed": true,
+    "unsloth_version": "2026.3.18",
+    "use_cache": false,
+    "video_token_id": 248057,
+    "vision_config": {
+        "deepstack_visual_indexes": [],
+        "depth": 24,
+        "torch_dtype": "bfloat16",
+        "hidden_act": "gelu_pytorch_tanh",
+        "hidden_size": 1024,
+        "in_channels": 3,
+        "initializer_range": 0.02,
+        "intermediate_size": 4096,
+        "model_type": "qwen3_5",
+        "num_heads": 16,
+        "num_position_embeddings": 2304,
+        "out_hidden_size": 2560,
+        "patch_size": 16,
+        "spatial_merge_size": 2,
+        "temporal_patch_size": 2
+    },
+    "vision_end_token_id": 248054,
+    "vision_start_token_id": 248053
+}

model.safetensors-00001-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a67a7ad7c379f37fd2a676bf189acc2294d4ee53ebc9c76e65f67ce06d227b06
+size 5329398688

model.safetensors-00002-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2e70507c698e7319abe1b268b1a936e59f77ca9ffc6288ff064167c54841525b
+size 3990429408

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,745 @@

+{
+  "metadata": {
+    "total_size": 9319737856
+  },
+  "weight_map": {
+    "model.language_model.embed_tokens.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.1.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.1.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.1.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.10.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.10.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.10.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.11.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.11.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.11.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.18.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.18.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.18.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.19.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.19.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.19.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.12.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.12.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.12.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.13.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.13.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.13.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.7.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.8.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.8.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.8.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.21.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.21.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.21.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.22.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.22.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.22.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.23.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.23.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.23.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.31.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.31.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.31.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.4.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.4.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.4.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.14.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.14.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.14.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.15.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.15.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.15.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.9.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.9.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.9.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "mtp.layers.0.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "mtp.layers.0.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "mtp.layers.0.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.29.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.3.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.3.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.3.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.30.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.30.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.30.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.16.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.16.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.16.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.17.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.17.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.17.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.24.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.24.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.24.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.5.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.5.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.5.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.6.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.6.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.6.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.7.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.7.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.2.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.2.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.2.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.20.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.20.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.20.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.25.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.25.mlp.gate_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.25.mlp.up_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.26.mlp.down_proj.weight": "model.safetensors-00001-of-00002.safetensors",
+    "model.language_model.layers.26.mlp.gate_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.26.mlp.up_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.27.mlp.down_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.27.mlp.gate_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.27.mlp.up_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.mlp.down_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.mlp.gate_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.mlp.up_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.29.mlp.gate_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.29.mlp.up_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.mlp.down_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.mlp.gate_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.mlp.up_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.10.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.11.self_attn.q_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.18.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.19.self_attn.q_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.2.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.12.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.13.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.7.self_attn.q_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.8.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.22.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.23.self_attn.q_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.31.self_attn.q_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.4.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.14.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.15.self_attn.q_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.9.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "mtp.layers.0.self_attn.q_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.3.self_attn.q_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.30.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.16.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.17.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.24.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.25.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.5.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.6.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.20.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.21.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.26.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.27.self_attn.q_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.29.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.1.linear_attn.in_proj_qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.merger.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "mtp.fc.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.10.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.10.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.11.self_attn.o_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.18.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.18.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.19.self_attn.o_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.2.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.12.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.12.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.13.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.13.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.7.self_attn.o_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.8.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.8.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.21.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.22.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.22.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.23.self_attn.o_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.31.self_attn.o_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.4.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.4.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.14.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.14.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.15.self_attn.o_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.9.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.9.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "mtp.layers.0.self_attn.o_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.3.self_attn.o_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.30.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.30.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.16.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.16.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.17.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.17.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.24.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.24.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.25.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.25.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.5.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.5.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.6.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.6.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.2.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.20.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.20.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.21.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.26.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.26.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.27.self_attn.o_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.29.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.29.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.1.linear_attn.in_proj_z.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.1.linear_attn.out_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.merger.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.0.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.0.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.1.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.1.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.10.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.10.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.11.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.11.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.12.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.12.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.13.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.13.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.14.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.14.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.15.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.15.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.16.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.16.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.17.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.17.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.18.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.18.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.19.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.19.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.2.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.2.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.20.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.20.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.21.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.21.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.22.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.22.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.23.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.23.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.3.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.3.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.4.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.4.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.5.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.5.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.6.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.6.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.7.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.7.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.8.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.8.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.9.mlp.linear_fc1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.9.mlp.linear_fc2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.0.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.1.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.10.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.11.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.12.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.13.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.14.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.15.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.16.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.17.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.18.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.19.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.2.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.20.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.21.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.22.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.23.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.3.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.4.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.5.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.6.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.7.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.8.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.9.attn.qkv.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.11.self_attn.k_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.11.self_attn.v_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.19.self_attn.k_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.19.self_attn.v_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.7.self_attn.k_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.7.self_attn.v_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.23.self_attn.k_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.23.self_attn.v_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.31.self_attn.k_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.31.self_attn.v_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.15.self_attn.k_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.15.self_attn.v_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "mtp.layers.0.self_attn.k_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "mtp.layers.0.self_attn.v_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.3.self_attn.k_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.3.self_attn.v_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.27.self_attn.k_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.27.self_attn.v_proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.pos_embed.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.patch_embed.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.0.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.1.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.10.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.11.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.12.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.13.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.14.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.15.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.16.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.17.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.18.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.19.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.2.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.20.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.21.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.22.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.23.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.3.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.4.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.5.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.6.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.7.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.8.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.9.attn.proj.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.10.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.10.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.18.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.18.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.2.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.2.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.12.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.12.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.13.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.13.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.8.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.8.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.22.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.22.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.4.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.4.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.14.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.14.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.9.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.9.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.30.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.30.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.16.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.16.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.17.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.17.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.24.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.24.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.25.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.25.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.5.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.5.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.6.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.6.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.20.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.20.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.21.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.21.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.26.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.26.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.29.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.29.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.1.linear_attn.in_proj_b.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.1.linear_attn.in_proj_a.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.10.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.2.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.12.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.13.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.8.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.9.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.22.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.4.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.5.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.14.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.16.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.30.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.17.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.18.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.24.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.25.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.6.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.20.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.21.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.26.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.29.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.1.linear_attn.conv1d.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.0.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.1.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.10.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.11.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.12.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.13.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.14.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.15.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.16.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.17.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.18.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.19.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.2.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.20.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.21.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.22.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.23.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.3.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.4.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.5.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.6.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.7.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.8.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.9.mlp.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.merger.linear_fc1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.0.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.1.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.10.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.11.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.12.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.13.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.14.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.15.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.16.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.17.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.18.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.19.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.2.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.20.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.21.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.22.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.23.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.3.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.4.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.5.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.6.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.7.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.8.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.9.attn.qkv.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.10.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.10.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.11.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.11.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.18.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.19.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.19.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.2.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.12.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.12.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.13.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.13.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.7.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.8.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.8.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.9.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.21.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.22.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.22.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.23.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.23.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.31.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.4.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.4.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.14.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.14.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.15.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.15.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.16.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.9.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "mtp.layers.0.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "mtp.layers.0.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "mtp.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "mtp.pre_fc_norm_embedding.weight": "model.safetensors-00002-of-00002.safetensors",
+    "mtp.pre_fc_norm_hidden.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.3.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.3.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.30.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.30.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.31.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.16.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.17.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.17.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.18.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.24.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.24.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.25.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.25.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.5.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.5.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.6.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.6.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.7.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.2.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.20.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.20.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.21.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.26.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.26.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.27.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.27.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.29.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.29.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.1.input_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.1.post_attention_layernorm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.merger.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.0.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.0.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.0.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.0.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.0.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.0.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.1.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.1.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.1.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.1.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.1.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.1.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.10.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.10.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.10.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.10.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.10.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.10.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.11.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.11.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.11.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.11.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.11.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.11.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.12.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.12.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.12.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.12.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.12.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.12.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.13.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.13.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.13.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.13.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.13.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.13.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.14.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.14.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.14.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.14.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.14.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.14.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.15.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.15.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.15.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.15.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.15.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.15.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.16.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.16.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.16.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.16.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.16.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.16.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.17.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.17.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.17.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.17.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.17.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.17.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.18.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.18.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.18.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.18.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.18.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.18.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.19.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.19.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.19.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.19.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.19.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.19.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.2.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.2.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.2.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.2.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.2.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.2.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.20.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.20.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.20.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.20.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.20.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.20.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.21.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.21.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.21.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.21.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.21.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.21.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.22.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.22.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.22.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.22.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.22.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.22.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.23.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.23.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.23.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.23.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.23.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.23.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.3.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.3.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.3.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.3.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.3.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.3.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.4.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.4.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.4.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.4.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.4.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.4.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.5.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.5.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.5.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.5.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.5.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.5.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.6.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.6.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.6.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.6.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.6.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.6.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.7.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.7.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.7.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.7.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.7.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.7.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.8.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.8.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.8.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.8.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.8.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.8.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.9.attn.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.9.mlp.linear_fc2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.9.norm1.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.9.norm1.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.9.norm2.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.blocks.9.norm2.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.merger.norm.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.merger.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.visual.patch_embed.proj.bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.10.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.11.self_attn.k_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.11.self_attn.q_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.18.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.19.self_attn.k_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.19.self_attn.q_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.2.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.12.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.13.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.7.self_attn.k_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.7.self_attn.q_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.8.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.22.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.23.self_attn.k_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.31.self_attn.k_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.31.self_attn.q_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.4.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.14.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.15.self_attn.k_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.15.self_attn.q_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.9.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "mtp.layers.0.self_attn.k_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "mtp.layers.0.self_attn.q_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.3.self_attn.k_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.3.self_attn.q_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.30.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.16.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.17.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.23.self_attn.q_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.24.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.25.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.5.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.6.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.20.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.21.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.26.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.27.self_attn.k_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.27.self_attn.q_norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.29.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.1.linear_attn.norm.weight": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.10.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.12.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.2.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.13.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.14.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.8.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.9.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.22.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.4.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.5.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.16.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.30.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.17.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.18.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.24.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.25.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.6.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.20.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.21.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.26.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.29.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.1.linear_attn.A_log": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.10.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.2.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.12.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.13.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.8.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.9.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.22.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.4.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.5.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.14.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.16.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.30.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.17.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.18.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.24.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.25.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.6.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.20.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.21.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.26.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.28.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.29.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.0.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors",
+    "model.language_model.layers.1.linear_attn.dt_bias": "model.safetensors-00002-of-00002.safetensors"
+  }
+}

processor_config.json ADDED Viewed

	@@ -0,0 +1,63 @@

+{
+  "image_processor": {
+    "data_format": "channels_first",
+    "do_convert_rgb": true,
+    "do_normalize": true,
+    "do_rescale": true,
+    "do_resize": true,
+    "image_mean": [
+      0.5,
+      0.5,
+      0.5
+    ],
+    "image_processor_type": "Qwen2VLImageProcessorFast",
+    "image_std": [
+      0.5,
+      0.5,
+      0.5
+    ],
+    "merge_size": 2,
+    "patch_size": 16,
+    "resample": 3,
+    "rescale_factor": 0.00392156862745098,
+    "size": {
+      "longest_edge": 16777216,
+      "shortest_edge": 65536
+    },
+    "temporal_patch_size": 2
+  },
+  "processor_class": "Qwen3VLProcessor",
+  "video_processor": {
+    "data_format": "channels_first",
+    "default_to_square": true,
+    "do_convert_rgb": true,
+    "do_normalize": true,
+    "do_rescale": true,
+    "do_resize": true,
+    "do_sample_frames": true,
+    "fps": 2,
+    "image_mean": [
+      0.5,
+      0.5,
+      0.5
+    ],
+    "image_std": [
+      0.5,
+      0.5,
+      0.5
+    ],
+    "max_frames": 768,
+    "merge_size": 2,
+    "min_frames": 4,
+    "patch_size": 16,
+    "resample": 3,
+    "rescale_factor": 0.00392156862745098,
+    "return_metadata": false,
+    "size": {
+      "longest_edge": 25165824,
+      "shortest_edge": 4096
+    },
+    "temporal_patch_size": 2,
+    "video_processor_type": "Qwen3VLVideoProcessor"
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
+size 19989343

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "add_prefix_space": false,
+  "audio_bos_token": "<|audio_start|>",
+  "audio_eos_token": "<|audio_end|>",
+  "audio_token": "<|audio_pad|>",
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "image_token": "<|image_pad|>",
+  "is_local": false,
+  "model_max_length": 262144,
+  "model_specific_special_tokens": {
+    "audio_bos_token": "<|audio_start|>",
+    "audio_eos_token": "<|audio_end|>",
+    "audio_token": "<|audio_pad|>",
+    "image_token": "<|image_pad|>",
+    "video_token": "<|video_pad|>",
+    "vision_bos_token": "<|vision_start|>",
+    "vision_eos_token": "<|vision_end|>"
+  },
+  "pad_token": "<|vision_pad|>",
+  "padding_side": "right",
+  "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
+  "processor_class": "Qwen3VLProcessor",
+  "split_special_tokens": false,
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": null,
+  "video_token": "<|video_pad|>",
+  "vision_bos_token": "<|vision_start|>",
+  "vision_eos_token": "<|vision_end|>",
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0].role == 'system' %}\n        {{- messages[0].content + '\\n\\n' }}\n    {%- endif %}\n    {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0].role == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n    {%- set index = (messages|length - 1) - loop.index0 %}\n    {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n        {%- set ns.multi_step_tool = false %}\n        {%- set ns.last_query_index = index %}\n    {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n    {%- if message.content is string %}\n        {%- set content = message.content %}\n    {%- else %}\n        {%- set content = '' %}\n    {%- endif %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n        {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {%- set reasoning_content = '' %}\n        {%- if message.reasoning_content is string %}\n            {%- set reasoning_content = message.reasoning_content %}\n        {%- else %}\n            {%- if '</think>' in content %}\n                {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n                {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n            {%- endif %}\n        {%- endif %}\n        {%- if loop.index0 > ns.last_query_index %}\n            {%- if loop.last or (not loop.last and reasoning_content) %}\n                {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n            {%- else %}\n                {{- '<|im_start|>' + message.role + '\\n' + content }}\n            {%- endif %}\n        {%- else %}\n            {{- '<|im_start|>' + message.role + '\\n' + content }}\n        {%- endif %}\n        {%- if message.tool_calls %}\n            {%- for tool_call in message.tool_calls %}\n                {%- if (loop.first and content) or (not loop.first) %}\n                    {{- '\\n' }}\n                {%- endif %}\n                {%- if tool_call.function %}\n                    {%- set tool_call = tool_call.function %}\n                {%- endif %}\n                {{- '<tool_call>\\n{\"name\": \"' }}\n                {{- tool_call.name }}\n                {{- '\", \"arguments\": ' }}\n                {%- if tool_call.arguments is string %}\n                    {{- tool_call.arguments }}\n                {%- else %}\n                    {{- tool_call.arguments | tojson }}\n                {%- endif %}\n                {{- '}\\n</tool_call>' }}\n            {%- endfor %}\n        {%- endif %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\n<think>\n' }}\n{%- endif %}"
+}