Upload folder using huggingface_hub

by dikw - opened Aug 20, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+152321

-36

Files changed (16) hide show

.gitattributes +13 -33
README.md +129 -3
README_CN.md +124 -0
added_tokens.json +24 -0
config.json +28 -0
generation_config.json +14 -0
merges.txt +0 -0
model-00001-of-00004.safetensors +3 -0
model-00002-of-00004.safetensors +3 -0
model-00003-of-00004.safetensors +3 -0
model-00004-of-00004.safetensors +3 -0
model.safetensors.index.json +346 -0
special_tokens_map.json +31 -0
tokenizer.json +3 -0
tokenizer_config.json +208 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -1,35 +1,15 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

+model-00001-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00002-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00003-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
+model-00004-of-00004.safetensors filter=lfs diff=lfs merge=lfs -text
 *.bin filter=lfs diff=lfs merge=lfs -text
 *.model filter=lfs diff=lfs merge=lfs -text
+generation_config.json filter=lfs diff=lfs merge=lfs -text
+vocab.json filter=lfs diff=lfs merge=lfs -text
+merges.txt filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+added_tokens.json filter=lfs diff=lfs merge=lfs -text
+config.json filter=lfs diff=lfs merge=lfs -text
+model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
+special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
+tokenizer_config.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,129 @@
----
-license: apache-2.0
----

+# ⚛️ Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward
+<p align="center">
+<a href="https://arxiv.org/abs/2508.12800" target="_blank">
+<img src="https://img.shields.io/badge/arXiv-2508.12800-b31b1b.svg?style=for-the-badge" alt="ArXiv">
+</a>
+<a href="https://huggingface.co/collections/ant-group/atom-searcher" target="_blank">
+<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow?style=for-the-badge" alt="Hugging Face">
+</a>
+<a href="https://github.com/antgroup/Research-Venus" target="_blank">
+<img src="https://img.shields.io/badge/GitHub-Repo-blue?style=for-the-badge&logo=github" alt="GitHub">
+</a>
+</p>
+[简体中文](README_CN.md)
+## 📖 Introduction
+Atom-Searcher is a novel framework designed to enhance the deep research capabilities of Large Language Models (LLMs). While LLMs show great promise, their static internal knowledge limits their ability to handle complex, multi-step tasksExisting methods like Retrieval-Augmented Generation (RAG) and outcome-based reinforcement learning (RL) often fall short due to rigid workflows, reward sparsity, and conflicting gradients during training.
+To overcome these challenges, we introduce **Atom-Searcher**, a new reinforcement learning framework built on the concept of **Atomic Thought**. This paradigm decomposes complex reasoning into fine-grained, functional units. Each "atomic thought" is evaluated by a Reasoning Reward Model (RRM), providing a fine-grained **Atomic Thought Reward (ATR)** that guides the agent's learning process.
+The framework uses a curriculum-inspired reward schedule that initially prioritizes high-quality reasoning processes before shifting focus to final outcomes, which accelerates the discovery of effective problem-solving strategies.
+Key advantages of Atom-Searcher include:
+* **State-of-the-Art Performance**: Achieves consistent improvements over existing models on seven different benchmarks.
+* **Enhanced Interpretability**: Exhibits more human-like and understandable reasoning patterns by breaking down its thought process.
+* **Efficient Training**: Mitigates issues of reward sparsity and gradient conflicts, leading to more efficient policy optimization.
+* **Scalable Computation**: Effectively scales its computational efforts during test-time to tackle more complex queries.
+<p align="center">
+<img src="png/sota_results.png" alt="Atom-Searcher SOTA Performance"/>
+</p>
+-----
+# Overview
+  * [Key Highlights](https://www.google.com/search?q=%23key-highlights)
+  * [Evaluation](https://www.google.com/search?q=%23evaluation)
+  * [Citation](https://www.google.com/search?q=%23citation)
+-----
+# ✨ Key Highlights
+We introduce **Atom-Searcher**, an agentic deep research framework that significantly improves LLM problem-solving by refining the reasoning process itself, not just the final outcome.
+-----
+### 💡 Introducing the "Atomic Thought" Paradigm
+We propose **Atomic Thought**, a novel thinking paradigm that decomposes complex reasoning into fine-grained, interpretable functional units. Instead of a single monolithic block of thought, the agent generates a sequence of atomic thoughts like `<OBSERVATION>`, `<HYPOTHESIS_TESTING>`, and `<RISK_ANALYSIS>`  This structured approach leads to:
+  - ✅ More human-like, interpretable, and in-depth reasoning patterns
+  - ✅ Scales computation at test-time
+  - ✅ Provides supervision anchors for RRMs, bridging deep research tasks and RRMs.
+-----
+### 🎯 Process-Supervised Reinforcement Learning with Fine-Grained Rewards
+Current agents rely on outcome-based reinforcement learning (RL), which suffers from **reward sparsity** and **gradient conflicts**—penalizing an entire reasoning chain for one wrong final answer. Atom-Searcher addresses this with:
+  - 🔹 **Reasoning Reward Models (RRMs):** An RRM scores each individual Atomic Thought, providing dense, fine-grained process-level rewards called Atomic Thought Rewards (ATR).
+  - 🔹 **Curriculum-Inspired Reward Schedule:** The framework dynamically balances the weight of process-level ATR and final outcome rewards. Early in training, it prioritizes good reasoning (ATR), and as the agent improves, it shifts focus to correct answers.
+  - 🔹 **Efficient Optimization:** This hybrid reward structure alleviates reward sparsity and guides the agent to discover effective reasoning paths much faster.
+-----
+### 🚀 SOTA Performance and Scalable Reasoning
+We demonstrate through extensive experiments that Atom-Searcher sets a new state-of-the-art in agentic deep research.
+  - 📈 It achieves significant performance gains over strong baselines like **DeepResearcher** and **R1-Searcher** on seven distinct benchmarks.
+  - 🧠 At test time, Atom-Searcher **scales its computation effectively**, generating 3.2x more tokens and making 1.24x more tool calls on average than the SOTA baseline, indicating deeper exploration and reasoning without explicit incentives.
+👉[Hugging Face Model](https://www.google.com/search?q=https://huggingface.co/collections/ant-group/atom-searcher)
+-----
+## Evaluation
+Atom-Searcher's effectiveness is validated across a diverse set of seven open-domain QA benchmarks
+### Main Results on In-Domain and Out-of-Domain Benchmarks
+Atom-Searcher consistently outperforms both training-based and prompt-based methods. All scores are F1 scores.
+| **Type** | **Method** | **NQ** | **TQ** | **HotpotQA** | **2Wiki** | **Musique** | **Bamboogle** | **PopQA** |
+| :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| Prompt Based | Search-al-Web | 32.4 | 58.9 | 33.0 | 30.9 | 14.7 | 46.6 | 38.3 |
+| Training Based | Search-R1-Instruct | 33.1 | 44.7 | 45.7 | 43.4 | 26.5 | 45.0 | 43.0 |
+| | R1-Searcher | 35.4 | 73.1 | 44.8 | 59.4 | 22.8 | 64.8 | 42.7 |
+| | DeepResearcher | 39.6 | 78.4 | 52.8 | 59.7 | 27.1 | **71.0** | 48.5 |
+| | **Atom-Searcher (Ours)** | **44.0** | **81.8** | **57.3** | **66.9** | **27.6** | 70.7 | **50.3** |
+> 🔝 **Experimental results show that Atom-Searcher achieves new state-of-the-art performance on 6 out of 7 benchmarks, with an average improvement of 8.5% on in-domain tasks and 2.5% on out-of-domain tasks over the previous SOTA, DeepResearcher.**
+### Ablation Study
+The ablation study confirms that both **Atomic Thought** and the **Reasoning Reward Model (RRM)** are critical for performance. Adding RRM rewards without the structured Atomic Thoughts provides minimal benefit.
+| **Method** | **NQ** | **TQ** | **HotpotQA** | **2Wiki** | **Musique** | **Bamboogle** | **PopQA** |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| Base (DeepResearcher) | 39.6 | 78.4 | 52.8 | 59.7 | 27.1 | 71.0 | 48.5 |
+| + RRM | 40.1 | 78.2 | 53.5 | 60.0 | 25.7 | 70.5 | 48.8 |
+| **Atom-Searcher (Base + RRM + Atomic Thought)** | **44.0** | **81.8** | **57.3** | **66.9** | **27.6** | **70.7** | **50.3** |
+# Citation
+Please consider citing if you find our work useful:
+```plain
+@misc{deng2025atomsearcherenhancingagenticdeep,
+      title={Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward},
+      author={Yong Deng and Guoqing Wang and Zhenzhe Ying and Xiaofeng Wu and Jinzhen Lin and Wenwen Xiong and Yuqin Dai and Shuo Yang and Zhanwei Zhang and Qiwen Wang and Yang Qin and Changhua Meng},
+      year={2025},
+      eprint={2508.12800},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2508.12800},
+}
+```

README_CN.md ADDED Viewed

	@@ -0,0 +1,124 @@

+# ⚛️ Atom-Searcher: 通过细粒度原子化思想奖励增强智能体的深度研究能力
+<p align="center">
+<a href="https://arxiv.org/abs/2508.12800" target="_blank">
+<img src="https://img.shields.io/badge/arXiv-2508.12800-b31b1b.svg?style=for-the-badge" alt="ArXiv">
+</a>
+<a href="https://huggingface.co/collections/ant-group/atom-searcher" target="_blank">
+<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow?style=for-the-badge" alt="Hugging Face">
+</a>
+<a href="https://github.com/antgroup/Research-Venus" target="_blank">
+<img src="https://img.shields.io/badge/GitHub-Repo-blue?style=for-the-badge&logo=github" alt="GitHub">
+</a>
+</p>
+## 📖 引言
+Atom-Searcher 是一个旨在增强大语言模型（LLMs）深度研究能力的新颖框架。尽管 LLMs 展现出巨大潜力，但其静态的内部知识限制了它们处理复杂、多步骤任务的能力。现有的方法，如检索增强生成（RAG）和基于结果的强化学习（RL），常常因其固化的工作流程、奖励稀疏性以及训练过程中的梯度冲突而表现不佳。
+为克服这些挑战，我们引入了 **Atom-Searcher**，这是一个建立在**原子化思想（Atomic Thought）概念之上的全新强化学习框架。该范式将复杂的推理过程分解为细粒度的功能单元。每一个“原子化思想”都由一个推理奖励模型（Reasoning Reward Model, RRM）进行评估，从而提供细粒度的原子化思想奖励（Atomic Thought Reward, ATR）**，用以指导智能体的学习过程。
+该框架采用了一种受课程学习启发的奖励机制，在初期优先奖励高质量的推理过程，随后再将重点转移到最终结果上，从而加速发现有效的问题解决策略。
+Atom-Searcher 的主要优势包括：
+  * **最先进的性能**：在七个不同的基准测试中，相较于现有模型均取得了一致的提升。
+  * **增强的可解释性**：通过分解其思考过程，展现出更类似人类且易于理解的推理模式。
+  * **高效的训练**：缓解了奖励稀疏性和梯度冲突问题，使策略优化更为高效。
+  * **可扩展的计算能力**：在测试时能有效扩展其计算投入，以解决更复杂的查询。
+<p align="center">
+<img src="png/sota_results.png" alt="Atom-Searcher SOTA Performance"/>
+</p>
+# 概览
+  * [主要亮点](https://www.google.com/search?q=%23key-highlights)
+  * [评估](https://www.google.com/search?q=%23evaluation)
+  * [引用](https://www.google.com/search?q=%23citation)
+-----
+# ✨ 主要亮点
+我们推出了 **Atom-Searcher**，一个旨在提升智能体深度研究能力的框架。它通过优化推理过程本身，而不仅仅是最终结果，显著提高了 LLM 解决问题的能力。
+-----
+### 💡 引入“原子化思想”范式
+我们提出了**原子化思想（Atomic Thought）**，这是一种新颖的思维范式，它将复杂的推理过程分解为细粒度的、可解释的功能单元。智能体不再生成一个单一、庞大的思想块，而是生成一系列原子化的思想，如 `<OBSERVATION>`（观察）、`<HYPOTHESIS_TESTING>`（假设检验）和 `<RISK_ANALYSIS>`（风险分析）。这种结构化的方法带来了：
+  - ✅ 更类似人类、可解释且更深入的推理模式
+  - ✅ 在测试时可扩展计算资源
+  - ✅ 为推理奖励模型（RRM）提供监督锚点，将深度研究任务与 RRM 联系起来。
+-----
+### 🎯 结合细粒度奖励的过程监督强化学习
+当前的智能体依赖于基于结果的强化学习（RL），但这种方法存在**奖励稀疏性**和**梯度冲突**的问题——即因为一个最终答案的错误而惩罚整个推理链。Atom-Searcher 通过以下方式解决此问题：
+  - 🔹 **推理奖励模型（RRM）：** RRM 为每一个独立的原子化思想打分，提供密集的、细粒度的过程级奖励，我们称之为原子化思想奖励（ATR）。
+  - 🔹 **课程学习式奖励策略：** 该框架动态平衡过程级 ATR 和最终结果奖励的权重。在训练初期，它优先鼓励良好的推理过程（ATR），随着智能体能力的提升，逐渐将重点转移到产出正确答案上。
+  - 🔹 **高效优化：** 这种混合奖励结构缓解了奖励稀疏性问题，并引导智能体更快地发现有效的推理路径。
+-----
+### 🚀 SOTA 性能与可扩展的推理能力
+我们通过大量实验证明，Atom-Searcher 在智能体深度研究领域树立了新的技术标杆（SOTA）。
+  - 📈 在七个不同的基准测试中，它相较于 **DeepResearcher** 和 **R1-Searcher** 等强大的基线模型取得了显著的性能提升。
+  - 🧠 在测试时，Atom-Searcher 能**有效地扩展其计算资源**，与 SOTA 基线模型相比，平均多生成 3.2 倍的 token 并多进行 1.24 倍的工具调用，这表明在没有明确激励的情况下，它也能进行更深度的探索和推理。
+👉[Hugging Face 模���](https://www.google.com/search?q=https://huggingface.co/collections/ant-group/atom-searcher)
+-----
+## 评估
+Atom-Searcher 的有效性在一系列多样化的开放域问答（QA）基准测试中得到了验证，共涵盖七个数据集。
+### 在域内和域外基准测试上的主要结果
+Atom-Searcher 的表现始终优于基于训练和基于提示的方法。所有分数均为 F1 分数。
+| **类型** | **方法** | **NQ** | **TQ** | **HotpotQA** | **2Wiki** | **Musique** | **Bamboogle** | **PopQA** |
+| :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| 基于提示 | Search-al-Web | 32.4 | 58.9 | 33.0 | 30.9 | 14.7 | 46.6 | 38.3 |
+| 基于训练 | Search-R1-Instruct | 33.1 | 44.7 | 45.7 | 43.4 | 26.5 | 45.0 | 43.0 |
+| | R1-Searcher | 35.4 | 73.1 | 44.8 | 59.4 | 22.8 | 64.8 | 42.7 |
+| | DeepResearcher | 39.6 | 78.4 | 52.8 | 59.7 | 27.1 | **71.0** | 48.5 |
+| | **Atom-Searcher (我们的模型)** | **44.0** | **81.8** | **57.3** | **66.9** | **27.6** | 70.7 | **50.3** |
+> 🔝 **实验结果表明，Atom-Searcher 在 7 个基准测试中的 6 个上取得了新的 SOTA 性能，与之前的 SOTA 模型 DeepResearcher 相比，在域内任务上平均提升了 8.5%，在域外任务上平均提升了 2.5%。**
+### 消融实验
+消融实验证实，**原子化思想**和\*\*推理奖励模型（RRM）\*\*对性能都至关重要。在没有结构化原子思想的情况下，仅添加 RRM 奖励所带来的收益微乎其微。
+| **方法** | **NQ** | **TQ** | **HotpotQA** | **2Wiki** | **Musique** | **Bamboogle** | **PopQA** |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| 基线 (DeepResearcher) | 39.6 | 78.4 | 52.8 | 59.7 | 27.1 | 71.0 | 48.5 |
+| + RRM | 40.1 | 78.2 | 53.5 | 60.0 | 25.7 | 70.5 | 48.8 |
+| **Atom-Searcher (基线 + RRM + 原子化思想)** | **44.0** | **81.8** | **57.3** | **66.9** | **27.6** | **70.7** | **50.3** |
+# 引用
+如果您觉得我们的工作对您有用，请考虑引用：
+```plain
+@misc{deng2025atomsearcherenhancingagenticdeep,
+      title={Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward},
+      author={Yong Deng and Guoqing Wang and Zhenzhe Ying and Xiaofeng Wu and Jinzhen Lin and Wenwen Xiong and Yuqin Dai and Shuo Yang and Zhanwei Zhang and Qiwen Wang and Yang Qin and Changhua Meng},
+      year={2025},
+      eprint={2508.12800},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2508.12800},
+}
+```

added_tokens.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "</tool_call>": 151658,
+  "<tool_call>": 151657,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "eos_token_id": 151645,
+  "hidden_act": "silu",
+  "hidden_size": 3584,
+  "initializer_range": 0.02,
+  "intermediate_size": 18944,
+  "max_position_embeddings": 32768,
+  "max_window_layers": 28,
+  "model_type": "qwen2",
+  "num_attention_heads": 28,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 4,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 1000000.0,
+  "sliding_window": 131072,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 152064
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "bos_token_id": 151643,
+  "do_sample": true,
+  "eos_token_id": [
+    151645,
+    151643
+  ],
+  "pad_token_id": 151643,
+  "repetition_penalty": 1.05,
+  "temperature": 0.7,
+  "top_k": 20,
+  "top_p": 0.8,
+  "transformers_version": "4.51.3"
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model-00001-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7df92c87f85b706ae6d14f3956d6fd38c2e28f7edcc2a3677acbaa48b6e880ec
+size 4954735584

model-00002-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c897ca3c4f45624a03cb7a41814e0993a2b07952228a51fbf6524d9315189870
+size 4026214376

model-00003-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:32a3ca4173aea1d32616fa880ccb57b301b9e3b0e9c618efa46c5b725d2d9be7
+size 4995166408

model-00004-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ad07daba8737cfe902364a2eca6be31667d34e272236e8edce6da8d1a1db2bd8
+size 1255155504

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,346 @@

+{
+  "metadata": {
+    "total_size": 15231233024
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00004-of-00004.safetensors",
+    "model.embed_tokens.weight": "model-00003-of-00004.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.bias": "model-00004-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.bias": "model-00004-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.bias": "model-00002-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.bias": "model-00001-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.bias": "model-00003-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.norm.weight": "model-00003-of-00004.safetensors"
+  }
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,208 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- messages[0]['content'] }}\n    {%- else %}\n        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n    {%- endif %}\n    {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0]['role'] == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n    {%- else %}\n        {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n        {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role }}\n        {%- if message.content %}\n            {{- '\\n' + message.content }}\n        {%- endif %}\n        {%- for tool_call in message.tool_calls %}\n            {%- if tool_call.function is defined %}\n                {%- set tool_call = tool_call.function %}\n            {%- endif %}\n            {{- '\\n<tool_call>\\n{\"name\": \"' }}\n            {{- tool_call.name }}\n            {{- '\", \"arguments\": ' }}\n            {{- tool_call.arguments | tojson }}\n            {{- '}\\n</tool_call>' }}\n        {%- endfor %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- message.content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff