Commit ·
513fa87
0
Parent(s):
Duplicate from dazdom63/Affine-Pen_5GvRk8Uf8m7epguAQqL2C1PVaVniP4Z6DWD7Pr2tZNkQgpbc
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- .gitattributes +37 -0
- LICENSE +21 -0
- README.md +11 -0
- chat_template.jinja +176 -0
- config.json +66 -0
- generation_config.json +9 -0
- model-00001-of-00136.safetensors +3 -0
- model-00002-of-00136.safetensors +3 -0
- model-00003-of-00136.safetensors +3 -0
- model-00004-of-00136.safetensors +3 -0
- model-00005-of-00136.safetensors +3 -0
- model-00006-of-00136.safetensors +3 -0
- model-00007-of-00136.safetensors +3 -0
- model-00008-of-00136.safetensors +3 -0
- model-00009-of-00136.safetensors +3 -0
- model-00010-of-00136.safetensors +3 -0
- model-00011-of-00136.safetensors +3 -0
- model-00012-of-00136.safetensors +3 -0
- model-00013-of-00136.safetensors +3 -0
- model-00014-of-00136.safetensors +3 -0
- model-00015-of-00136.safetensors +3 -0
- model-00016-of-00136.safetensors +3 -0
- model-00017-of-00136.safetensors +3 -0
- model-00018-of-00136.safetensors +3 -0
- model-00019-of-00136.safetensors +3 -0
- model-00020-of-00136.safetensors +3 -0
- model-00021-of-00136.safetensors +3 -0
- model-00022-of-00136.safetensors +3 -0
- model-00023-of-00136.safetensors +3 -0
- model-00024-of-00136.safetensors +3 -0
- model-00025-of-00136.safetensors +3 -0
- model-00026-of-00136.safetensors +3 -0
- model-00027-of-00136.safetensors +3 -0
- model-00028-of-00136.safetensors +3 -0
- model-00029-of-00136.safetensors +3 -0
- model-00030-of-00136.safetensors +3 -0
- model-00031-of-00136.safetensors +3 -0
- model-00032-of-00136.safetensors +3 -0
- model-00033-of-00136.safetensors +3 -0
- model-00034-of-00136.safetensors +3 -0
- model-00035-of-00136.safetensors +3 -0
- model-00036-of-00136.safetensors +3 -0
- model-00037-of-00136.safetensors +3 -0
- model-00038-of-00136.safetensors +3 -0
- model-00039-of-00136.safetensors +3 -0
- model-00040-of-00136.safetensors +3 -0
- model-00041-of-00136.safetensors +3 -0
- model-00042-of-00136.safetensors +3 -0
- model-00043-of-00136.safetensors +3 -0
- model-00044-of-00136.safetensors +3 -0
.gitattributes
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
*.pdf filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
MIT License
|
| 2 |
+
|
| 3 |
+
Copyright (c) 2023 DeepSeek
|
| 4 |
+
|
| 5 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
| 6 |
+
of this software and associated documentation files (the "Software"), to deal
|
| 7 |
+
in the Software without restriction, including without limitation the rights
|
| 8 |
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
| 9 |
+
copies of the Software, and to permit persons to whom the Software is
|
| 10 |
+
furnished to do so, subject to the following conditions:
|
| 11 |
+
|
| 12 |
+
The above copyright notice and this permission notice shall be included in all
|
| 13 |
+
copies or substantial portions of the Software.
|
| 14 |
+
|
| 15 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
| 16 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
| 17 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
| 18 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
| 19 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
| 20 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
| 21 |
+
SOFTWARE.
|
README.md
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
library_name: transformers
|
| 4 |
+
base_model:
|
| 5 |
+
- deepseek-ai/DeepSeek-V3.2-Exp-Base
|
| 6 |
+
base_model_relation: finetune
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# Affine Challenger Model
|
| 10 |
+
|
| 11 |
+
This is an experimental model for Affine (Bittensor Subnet 120).
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,176 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{# DeepSeek-V3.2 DSML chat template (compatible with encoding_dsv32.py) #}
|
| 2 |
+
{%- if thinking is not defined %}{% set thinking = false %}{% endif -%}
|
| 3 |
+
{%- if drop_thinking is not defined %}{% set drop_thinking = true %}{% endif -%}
|
| 4 |
+
{%- set dsml_token = "|DSML|" -%}
|
| 5 |
+
{%- set thinking_start_token = "<think>" -%}
|
| 6 |
+
{%- set thinking_end_token = "</think>" -%}
|
| 7 |
+
{%- set eos_token = "<|end▁of▁sentence|>" -%}
|
| 8 |
+
|
| 9 |
+
{# Identify last user/developer index #}
|
| 10 |
+
{%- set ns = namespace(last_user=-1) -%}
|
| 11 |
+
{%- for message in messages -%}
|
| 12 |
+
{%- if message["role"] in ["user", "developer"] -%}
|
| 13 |
+
{%- set ns.last_user = loop.index0 -%}
|
| 14 |
+
{%- endif -%}
|
| 15 |
+
{%- endfor -%}
|
| 16 |
+
|
| 17 |
+
{# Build system prompt (concatenate all system messages) #}
|
| 18 |
+
{%- set sp = namespace(text="") -%}
|
| 19 |
+
{%- for message in messages -%}
|
| 20 |
+
{%- if message["role"] == "system" -%}
|
| 21 |
+
{%- if sp.text -%}
|
| 22 |
+
{%- set sp.text = sp.text + "\n\n" + (message["content"] or "") -%}
|
| 23 |
+
{%- else -%}
|
| 24 |
+
{%- set sp.text = (message["content"] or "") -%}
|
| 25 |
+
{%- endif -%}
|
| 26 |
+
{%- endif -%}
|
| 27 |
+
{%- endfor -%}
|
| 28 |
+
|
| 29 |
+
{# Tools block if provided globally #}
|
| 30 |
+
{%- if tools is defined and tools is not none -%}
|
| 31 |
+
{%- set tool_ns = namespace(text="") -%}
|
| 32 |
+
{%- for tool in tools -%}
|
| 33 |
+
{%- set t = tool["function"] if tool.get("function") else tool -%}
|
| 34 |
+
{%- if loop.first -%}
|
| 35 |
+
{%- set tool_ns.text = tool_ns.text + (t | tojson) -%}
|
| 36 |
+
{%- else -%}
|
| 37 |
+
{%- set tool_ns.text = tool_ns.text + "\n" + (t | tojson) -%}
|
| 38 |
+
{%- endif -%}
|
| 39 |
+
{%- endfor -%}
|
| 40 |
+
{%- set tools_block -%}
|
| 41 |
+
## Tools
|
| 42 |
+
|
| 43 |
+
You have access to a set of tools you can use to answer the user's question.
|
| 44 |
+
You can invoke functions by writing a "<{{ dsml_token }}function_calls>" block like the following as part of your reply to the user:
|
| 45 |
+
<{{ dsml_token }}function_calls>
|
| 46 |
+
<{{ dsml_token }}invoke name="$FUNCTION_NAME">
|
| 47 |
+
<{{ dsml_token }}parameter name="$PARAMETER_NAME" string="true|false">$PARAMETER_VALUE</{{ dsml_token }}parameter>
|
| 48 |
+
...
|
| 49 |
+
</{{ dsml_token }}invoke>
|
| 50 |
+
<{{ dsml_token }}invoke name="$FUNCTION_NAME2">
|
| 51 |
+
...
|
| 52 |
+
</{{ dsml_token }}invoke>
|
| 53 |
+
</{{ dsml_token }}function_calls>
|
| 54 |
+
|
| 55 |
+
String and scalar parameters should be specified as is without any escaping or quotes, while lists and objects should use JSON format. The "string" attribute should be set to "true" for string type parameters and "false" for other types (numbers, booleans, arrays, objects).
|
| 56 |
+
|
| 57 |
+
If the thinking_mode is enabled, then after function results you should strongly consider outputting a thinking block. Here is an example:
|
| 58 |
+
|
| 59 |
+
<{{ dsml_token }}function_calls>
|
| 60 |
+
...
|
| 61 |
+
</{{ dsml_token }}function_calls>
|
| 62 |
+
|
| 63 |
+
<function_results>
|
| 64 |
+
...
|
| 65 |
+
</function_results>
|
| 66 |
+
|
| 67 |
+
{{ thinking_start_token }}...thinking about results{{ thinking_end_token }}
|
| 68 |
+
|
| 69 |
+
Here are the functions available in JSONSchema format:
|
| 70 |
+
<functions>
|
| 71 |
+
{{ tool_ns.text }}
|
| 72 |
+
</functions>
|
| 73 |
+
{%- endset -%}
|
| 74 |
+
{%- if sp.text -%}
|
| 75 |
+
{%- set sp.text = sp.text + "\n\n" + tools_block -%}
|
| 76 |
+
{%- else -%}
|
| 77 |
+
{%- set sp.text = "\n\n" + tools_block -%}
|
| 78 |
+
{%- endif -%}
|
| 79 |
+
{%- endif -%}
|
| 80 |
+
|
| 81 |
+
{# Response format on system messages #}
|
| 82 |
+
{%- for message in messages -%}
|
| 83 |
+
{%- if message["role"] == "system" and message.get("response_format") -%}
|
| 84 |
+
{%- set sp.text = sp.text + "\n\n## Response Format:\n\nYou MUST strictly adhere to the following schema to reply:\n" + (message["response_format"] | tojson) -%}
|
| 85 |
+
{%- endif -%}
|
| 86 |
+
{%- endfor -%}
|
| 87 |
+
|
| 88 |
+
{# Emit BOS and system prompt #}
|
| 89 |
+
{{- sp.text -}}
|
| 90 |
+
|
| 91 |
+
{# Render messages #}
|
| 92 |
+
{%- for message in messages -%}
|
| 93 |
+
{%- set idx = loop.index0 -%}
|
| 94 |
+
{%- set role = message["role"] -%}
|
| 95 |
+
{%- set content = message.get("content", "") if message.get("content") is not none else "" -%}
|
| 96 |
+
|
| 97 |
+
{%- if role == "developer" -%}
|
| 98 |
+
{%- set dev_content = "" -%}
|
| 99 |
+
{%- if message.get("response_format") -%}
|
| 100 |
+
{%- set dev_content = dev_content + "\n\n## Response Format:\n\nYou MUST strictly adhere to the following schema to reply:\n" + (message["response_format"] | tojson) -%}
|
| 101 |
+
{%- endif -%}
|
| 102 |
+
{%- set dev_content = dev_content + "\n\n# The user's message is: " + content -%}
|
| 103 |
+
{{ "<|User|>" + dev_content + "<|Assistant|>" }}{{ thinking_start_token if idx == ns.last_user and thinking else thinking_end_token }}
|
| 104 |
+
|
| 105 |
+
{%- elif role == "user" -%}
|
| 106 |
+
{{ "<|User|>" + content + "<|Assistant|>" }}{{ thinking_start_token if idx == ns.last_user and thinking else thinking_end_token }}
|
| 107 |
+
|
| 108 |
+
{%- elif role == "assistant" -%}
|
| 109 |
+
{%- if message.get("tool_calls") -%}
|
| 110 |
+
{%- set first = true -%}
|
| 111 |
+
{%- if idx > ns.last_user and thinking and (message.get("reasoning_content") or message.get("tool_calls")) -%}
|
| 112 |
+
{{ (message.get("reasoning_content","") if not (thinking and drop_thinking and idx <= ns.last_user) else "") + thinking_end_token }}
|
| 113 |
+
{%- else -%}
|
| 114 |
+
{{ message.get("reasoning_content","") if not (thinking and drop_thinking and idx <= ns.last_user) else "" }}
|
| 115 |
+
{%- endif -%}
|
| 116 |
+
{%- set tc = namespace(first=true) -%}
|
| 117 |
+
{%- for tool in message["tool_calls"] -%}
|
| 118 |
+
{%- set func = tool["function"] -%}
|
| 119 |
+
{%- set tool_name = func["name"] -%}
|
| 120 |
+
{%- set raw_args = func["arguments"] -%}
|
| 121 |
+
{%- set param_ns = namespace(text="", first=true) -%}
|
| 122 |
+
{%- if raw_args is mapping -%}
|
| 123 |
+
{%- set parsed = raw_args -%}
|
| 124 |
+
{%- elif raw_args is string -%}
|
| 125 |
+
{%- set parsed = raw_args | from_json -%}
|
| 126 |
+
{%- else -%}
|
| 127 |
+
{%- set parsed = {} -%}
|
| 128 |
+
{%- endif -%}
|
| 129 |
+
{%- for k,v in parsed.items() -%}
|
| 130 |
+
{%- set is_str = "true" if v is string else "false" -%}
|
| 131 |
+
{%- if v is string -%}
|
| 132 |
+
{%- set val = v -%}
|
| 133 |
+
{%- else -%}
|
| 134 |
+
{%- set val = v | tojson -%}
|
| 135 |
+
{%- endif -%}
|
| 136 |
+
{%- set piece = "<" + dsml_token + "parameter name=\"" + k + "\" string=\"" + is_str + "\">" + val + "</" + dsml_token + "parameter>" -%}
|
| 137 |
+
{%- if param_ns.first -%}
|
| 138 |
+
{%- set param_ns.text = piece -%}
|
| 139 |
+
{%- set param_ns.first = false -%}
|
| 140 |
+
{%- else -%}
|
| 141 |
+
{%- set param_ns.text = param_ns.text + "\n" + piece -%}
|
| 142 |
+
{%- endif -%}
|
| 143 |
+
{%- endfor -%}
|
| 144 |
+
{%- set param_block = param_ns.text -%}
|
| 145 |
+
{%- if tc.first -%}
|
| 146 |
+
{%- if content -%}
|
| 147 |
+
{{ content + "\n\n<" + dsml_token + "function_calls>\n" + "<" + dsml_token + "invoke name=\"" + tool_name + "\">\n" + param_block + "\n</" + dsml_token + "invoke>" }}
|
| 148 |
+
{%- else -%}
|
| 149 |
+
{{ "\n\n<" + dsml_token + "function_calls>\n" + "<" + dsml_token + "invoke name=\"" + tool_name + "\">\n" + param_block + "\n</" + dsml_token + "invoke>" }}
|
| 150 |
+
{%- endif -%}
|
| 151 |
+
{%- set tc.first = false -%}
|
| 152 |
+
{%- else -%}
|
| 153 |
+
{{ "\n" + "<" + dsml_token + "invoke name=\"" + tool_name + "\">\n" + param_block + "\n</" + dsml_token + "invoke>" }}
|
| 154 |
+
{%- endif -%}
|
| 155 |
+
{%- endfor -%}
|
| 156 |
+
{{ "\n</" + dsml_token + "function_calls>" + eos_token }}
|
| 157 |
+
{%- else -%}
|
| 158 |
+
{%- if idx > ns.last_user and thinking and not (drop_thinking and idx <= ns.last_user) and message.get("reasoning_content") -%}
|
| 159 |
+
{{ message["reasoning_content"] + thinking_end_token }}
|
| 160 |
+
{%- endif -%}
|
| 161 |
+
{{ content + eos_token }}
|
| 162 |
+
{%- endif -%}
|
| 163 |
+
|
| 164 |
+
{%- elif role == "tool" -%}
|
| 165 |
+
{%- set is_first_tool = loop.index0 == 0 or messages[loop.index0 - 1]["role"] != "tool" -%}
|
| 166 |
+
{%- set is_last_tool = loop.last or messages[loop.index0 + 1]["role"] != "tool" -%}
|
| 167 |
+
{%- if is_first_tool -%}{{- "\n\n<function_results>" -}}{%- endif -%}
|
| 168 |
+
{{- "\n<result>" + content + "</result>" -}}
|
| 169 |
+
{%- if is_last_tool -%}
|
| 170 |
+
{{- "\n</function_results>" -}}
|
| 171 |
+
{{- ("\n\n" + thinking_start_token) if idx >= ns.last_user and thinking else ("\n\n" + thinking_end_token) -}}
|
| 172 |
+
{%- endif -%}
|
| 173 |
+
{%- endif -%}
|
| 174 |
+
{%- endfor -%}
|
| 175 |
+
|
| 176 |
+
{# No extra generation prompt needed; assistant prefix is handled inline #}
|
config.json
ADDED
|
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"DeepseekV32ForCausalLM"
|
| 4 |
+
],
|
| 5 |
+
"attention_bias": false,
|
| 6 |
+
"attention_dropout": 0.0,
|
| 7 |
+
"bos_token_id": 0,
|
| 8 |
+
"eos_token_id": 1,
|
| 9 |
+
"ep_size": 1,
|
| 10 |
+
"first_k_dense_replace": 3,
|
| 11 |
+
"hidden_act": "silu",
|
| 12 |
+
"hidden_size": 7168,
|
| 13 |
+
"index_head_dim": 128,
|
| 14 |
+
"index_n_heads": 64,
|
| 15 |
+
"index_topk": 2048,
|
| 16 |
+
"initializer_range": 0.02,
|
| 17 |
+
"intermediate_size": 18432,
|
| 18 |
+
"kv_lora_rank": 512,
|
| 19 |
+
"max_position_embeddings": 163840,
|
| 20 |
+
"model_type": "deepseek_v32",
|
| 21 |
+
"moe_intermediate_size": 2048,
|
| 22 |
+
"moe_layer_freq": 1,
|
| 23 |
+
"n_group": 8,
|
| 24 |
+
"n_routed_experts": 256,
|
| 25 |
+
"n_shared_experts": 1,
|
| 26 |
+
"norm_topk_prob": true,
|
| 27 |
+
"num_attention_heads": 128,
|
| 28 |
+
"num_experts_per_tok": 8,
|
| 29 |
+
"num_hidden_layers": 61,
|
| 30 |
+
"num_key_value_heads": 128,
|
| 31 |
+
"num_nextn_predict_layers": 1,
|
| 32 |
+
"q_lora_rank": 1536,
|
| 33 |
+
"qk_nope_head_dim": 128,
|
| 34 |
+
"qk_rope_head_dim": 64,
|
| 35 |
+
"quantization_config": {
|
| 36 |
+
"activation_scheme": "dynamic",
|
| 37 |
+
"fmt": "e4m3",
|
| 38 |
+
"quant_method": "fp8",
|
| 39 |
+
"scale_fmt": "ue8m0",
|
| 40 |
+
"weight_block_size": [
|
| 41 |
+
128,
|
| 42 |
+
128
|
| 43 |
+
]
|
| 44 |
+
},
|
| 45 |
+
"rms_norm_eps": 1e-06,
|
| 46 |
+
"rope_scaling": {
|
| 47 |
+
"beta_fast": 32,
|
| 48 |
+
"beta_slow": 1,
|
| 49 |
+
"factor": 40,
|
| 50 |
+
"mscale": 1.0,
|
| 51 |
+
"mscale_all_dim": 1.0,
|
| 52 |
+
"original_max_position_embeddings": 4096,
|
| 53 |
+
"type": "yarn"
|
| 54 |
+
},
|
| 55 |
+
"rope_theta": 10000,
|
| 56 |
+
"routed_scaling_factor": 2.5,
|
| 57 |
+
"scoring_func": "sigmoid",
|
| 58 |
+
"tie_word_embeddings": false,
|
| 59 |
+
"topk_group": 4,
|
| 60 |
+
"topk_method": "noaux_tc",
|
| 61 |
+
"torch_dtype": "bfloat16",
|
| 62 |
+
"transformers_version": "4.44.2",
|
| 63 |
+
"use_cache": true,
|
| 64 |
+
"v_head_dim": 128,
|
| 65 |
+
"vocab_size": 129280
|
| 66 |
+
}
|
generation_config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_from_model_config": true,
|
| 3 |
+
"bos_token_id": 0,
|
| 4 |
+
"eos_token_id": 1,
|
| 5 |
+
"do_sample": true,
|
| 6 |
+
"temperature": 0.05,
|
| 7 |
+
"top_p": 0.95,
|
| 8 |
+
"transformers_version": "4.46.3"
|
| 9 |
+
}
|
model-00001-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:cfdb245eaf9457b60fafd0958965097ae3379491781e61bde5761092dc90b705
|
| 3 |
+
size 5086362008
|
model-00002-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:aa127856b7c5ab1adc868214b5423b2d352eb4cccc1a1bd28c3c028044afbb84
|
| 3 |
+
size 5095312328
|
model-00003-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:eb0d781b41f6e1e0951afb00b0cc0fdf16a37774d424132aaf0541006b610524
|
| 3 |
+
size 5062274344
|
model-00004-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1358fe8afadf5b85a6f45d9aee4ea6f834a127c557ac81c7b66eda7ff9e6ad45
|
| 3 |
+
size 5098720776
|
model-00005-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dc6b65dc8f8128722ef7ccd3abc4d3f0d0c1001b719bbaadf3510240445f94b2
|
| 3 |
+
size 5095312776
|
model-00006-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ced274a9134954e42ebf4f6008470fc28a8f174209f768dc8589297f2d073133
|
| 3 |
+
size 5095050280
|
model-00007-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fb428b04ca4f50704e82a95df30e28bedb1cb3ee91c0456b4d8df8395f24b525
|
| 3 |
+
size 5095312656
|
model-00008-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b5ba32ba9054d7b65763bb489081fc189ef1710fb0d8a2290cfd31983af1c52b
|
| 3 |
+
size 5095050464
|
model-00009-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:05531b531406a91fe2f3753ee73fb02fc4e4062976d67b455e50e71d29350b06
|
| 3 |
+
size 5095312472
|
model-00010-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:672ed0aa984aa014d27ac471a247b6ff050c8f477e840915aa20a6514cb41483
|
| 3 |
+
size 5095050632
|
model-00011-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fe263d8065ffadc7c86a8c05c5d39f9f655bba097a5b5137f41e624b35965a32
|
| 3 |
+
size 5095312296
|
model-00012-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bab643eac6e341ebdadedac98b7386230d3cd94eb2e5ecc4220be6169c8719bc
|
| 3 |
+
size 5095312768
|
model-00013-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fa9c2ef6499487f34328240b7bc1cbc77472b6ff85364e3b024d5689c5881f54
|
| 3 |
+
size 5095050184
|
model-00014-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:eb6b1034d7bbaabcd43b9c928e90dfb9ad9783d7e4b694f077e44d403c1abb0e
|
| 3 |
+
size 5095312768
|
model-00015-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3aec3b9676472562d75c89c8cd6699d33cfb79042663c518e6da9c5cf55e8286
|
| 3 |
+
size 5095050312
|
model-00016-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7a5d130a8dccbfaa9a5649985f567f47a452e6bbd51c73995799f53695905887
|
| 3 |
+
size 5095312624
|
model-00017-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:66cc995bec931011967ad5f10834e2e2fe2178c745cda0b7420679dc314df001
|
| 3 |
+
size 5095050768
|
model-00018-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7f9cdf35e48471555f1d6db6b9f39d603db513bd8b90d36ab835e1557c449dc4
|
| 3 |
+
size 5095313136
|
model-00019-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1bf5b1ddd18e0f666146631d6fb011dd5cdd43b29d80966e634baddb4584c53c
|
| 3 |
+
size 5095051352
|
model-00020-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5fa2dd726efe1024204fa874f53eb16e86df2e4c2755bc8c24c455d558f4bdf3
|
| 3 |
+
size 5095312952
|
model-00021-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:79f1ed38fd046af5e0448ca2a9ed4d0b245174eb802a6be2bb90fc420d2e3a35
|
| 3 |
+
size 5095313464
|
model-00022-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7510bffa6bdff0ed5f4a21de2f450b63b64003902d52987cf175cfc6c8a34bf3
|
| 3 |
+
size 5095050856
|
model-00023-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:771e88639a29511935536e75df07e7c5c9eb0468179db21dda489981aee26685
|
| 3 |
+
size 5095313464
|
model-00024-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6327f83fab38dbe539d70da975eec7e94f1220a300e579d3f3adbfdd6966655d
|
| 3 |
+
size 5095051032
|
model-00025-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:875715dbbd4fb8c2717f628d8e70d588746ce7b833a36c67a89caf424c34e04a
|
| 3 |
+
size 5095313280
|
model-00026-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e32da83d0235821fc91a2d2e03c444485d562934f2df57e95e213030e0a77cdc
|
| 3 |
+
size 5095051216
|
model-00027-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:295e1ea457a7fb9993ffe826c60243478245371a76a7fe9057f84fa7f5913553
|
| 3 |
+
size 5095313104
|
model-00028-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e6614057028ab2d7f707ebf910b69c09eeffc6b7ae474b49a6167cc9017da5b5
|
| 3 |
+
size 5095051392
|
model-00029-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:929cd091fd7f8436430afc8dacef85d2d9b448d8754b7db8db97775aebdc5483
|
| 3 |
+
size 5095312912
|
model-00030-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4d44ee7b25347c797e198f31d35263deea93ed652acae563a63e03315684f0d0
|
| 3 |
+
size 5095313464
|
model-00031-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d964492fe1333a9825b2b02a74faf3c321cc9c5fe9407b9a7bf855a1d6873966
|
| 3 |
+
size 5095050888
|
model-00032-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e7f295ef28a0b28aa29b5cffec4b0c5cfb4d27a36e650c5dddbe6fa7232f32ea
|
| 3 |
+
size 5095313432
|
model-00033-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d4e035261cfc86a3113f53221009e1e012786dac7bf9e006941cd66889578a95
|
| 3 |
+
size 5095051072
|
model-00034-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7782f0302e00f62d4b212750c9a10e154c9c6e90237a76532e71de6615f32ea0
|
| 3 |
+
size 5095313248
|
model-00035-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6959448b54abd2c57882c36869f5e064458ea945a54aa9dd0143fc3867f28c49
|
| 3 |
+
size 5095051248
|
model-00036-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f3f12c180389f413b62b211850a918894d9e82ea078d5ec018cf24d5ff323e11
|
| 3 |
+
size 5095313064
|
model-00037-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:578cf01f901d37e2fb10366ed9051caeb093c0d9b37e24850faaa21e74533e20
|
| 3 |
+
size 5095051456
|
model-00038-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6ac54bfe8bf1433233f1bc8705e3b8e416de2b0f61361cb3b32edba314ff2fc4
|
| 3 |
+
size 5095312848
|
model-00039-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b2b6aa6c2865308cc664c301f9826bf07c44f8c1a4688a5b5490a07731fe5d78
|
| 3 |
+
size 5095313464
|
model-00040-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3835a041b24849a78557b89dfb8b8d4815c2f9719b94f1386f3a3dad2b7f84a8
|
| 3 |
+
size 5095050920
|
model-00041-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1a9c5c50475086b360cfb69b9be0257e74d3e8e1ee9a1cf8fd96d4f93822ee10
|
| 3 |
+
size 5095313400
|
model-00042-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9a37f9cc63015bb2f0712e4d401311cb67edf0ababf1fcd61c908998aeeb7713
|
| 3 |
+
size 5095051104
|
model-00043-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b07b3d7089ed2236ce18751adf2f851fab134776715d01243201e3bc5e2eeeae
|
| 3 |
+
size 5095313216
|
model-00044-of-00136.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fad1f6640469796841f55f91e66cfd1e55e3e1eb6896c461b4e087d6f2820c38
|
| 3 |
+
size 5095051288
|