CoruNethron fangfangfang123 commited on
Commit
d52bfd1
·
0 Parent(s):

Duplicate from jdopensource/JoyAI-LLM-Flash

Browse files

Co-authored-by: zhenfang wang <fangfangfang123@users.noreply.huggingface.co>

This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +36 -0
  2. LICENSE +28 -0
  3. README.md +380 -0
  4. chat_template.jinja +103 -0
  5. config.json +49 -0
  6. configuration.json +1 -0
  7. configuration_deepseek.py +247 -0
  8. docs/deploy_guidance.md +42 -0
  9. figures/joyai-logo.png +3 -0
  10. model-1-of-40.safetensors +3 -0
  11. model-10-of-40.safetensors +3 -0
  12. model-11-of-40.safetensors +3 -0
  13. model-12-of-40.safetensors +3 -0
  14. model-13-of-40.safetensors +3 -0
  15. model-14-of-40.safetensors +3 -0
  16. model-15-of-40.safetensors +3 -0
  17. model-16-of-40.safetensors +3 -0
  18. model-17-of-40.safetensors +3 -0
  19. model-18-of-40.safetensors +3 -0
  20. model-19-of-40.safetensors +3 -0
  21. model-2-of-40.safetensors +3 -0
  22. model-20-of-40.safetensors +3 -0
  23. model-21-of-40.safetensors +3 -0
  24. model-22-of-40.safetensors +3 -0
  25. model-23-of-40.safetensors +3 -0
  26. model-24-of-40.safetensors +3 -0
  27. model-25-of-40.safetensors +3 -0
  28. model-26-of-40.safetensors +3 -0
  29. model-27-of-40.safetensors +3 -0
  30. model-28-of-40.safetensors +3 -0
  31. model-29-of-40.safetensors +3 -0
  32. model-3-of-40.safetensors +3 -0
  33. model-30-of-40.safetensors +3 -0
  34. model-31-of-40.safetensors +3 -0
  35. model-32-of-40.safetensors +3 -0
  36. model-33-of-40.safetensors +3 -0
  37. model-34-of-40.safetensors +3 -0
  38. model-35-of-40.safetensors +3 -0
  39. model-36-of-40.safetensors +3 -0
  40. model-37-of-40.safetensors +3 -0
  41. model-38-of-40.safetensors +3 -0
  42. model-39-of-40.safetensors +3 -0
  43. model-4-of-40.safetensors +3 -0
  44. model-40-of-40.safetensors +3 -0
  45. model-5-of-40.safetensors +3 -0
  46. model-6-of-40.safetensors +3 -0
  47. model-7-of-40.safetensors +3 -0
  48. model-8-of-40.safetensors +3 -0
  49. model-9-of-40.safetensors +3 -0
  50. model-non-layer.safetensors +3 -0
.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ figures/joyai-logo.png filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Modified MIT License
2
+
3
+ Copyright (c) 2026 JD AI
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the “Software”), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
22
+
23
+ We offer you a license similar to the MIT License. In the event that the Software
24
+ (or any derivative works thereof) is used for any of your commercial products or
25
+ services that either have more than 100 million monthly active users or generate
26
+ more than 20 million US dollars (or equivalent in other currencies) in monthly
27
+ revenue, you are required to clearly display "JoyAI-LLM" on the user interface
28
+ of such product or service.
README.md ADDED
@@ -0,0 +1,380 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - zh
4
+ - en
5
+ pipeline_tag: text-generation
6
+ ---
7
+ <div align="center">
8
+ <picture>
9
+ <img src="figures/joyai-logo.png" width="30%" alt="JoyAI-LLM Flash">
10
+ </picture>
11
+ </div>
12
+ <hr>
13
+
14
+ <div align="center" style="line-height: 1;">
15
+ <a href="https://huggingface.co/jdopensource" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-JD-ffc107?color=ffc107&logoColor=white"/></a>
16
+ <a href="https://huggingface.co/jdopensource/JoyAI-LLM-Flash/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Modified_MIT-f5de53?&color=f5de53"/></a>
17
+ </div>
18
+
19
+
20
+
21
+
22
+ ## 1. Model Introduction
23
+
24
+ JoyAI-LLM Flash is a state-of-the-art medium-sized instruct language model with 3 billion activated parameters and 48 billion total parameters. JoyAI-LLM Flash was pretrained on 20 trillion text tokens using Muon optimizer, followed by large-scale supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning (RL) across diverse environments. JoyAI-LLM Flash achieves strong performance across frontier knowledge, reasoning, coding tasks and agentic capabilities.
25
+
26
+ ### Key Features
27
+
28
+ - Fiber Bundle RL: Introduces fiber bundle theory into reinforcement learning, proposing a novel optimization framework, FiberPO. This method is specifically designed to handle the challenges of large-scale and heterogeneous agent training, improving stability and robustness under complex data distributions.
29
+ - Training-Inference Collaboration: apply Muon optimizer with dense MTP, develop novel optimization techniques to resolve instabilities while scaling up, delivering 1.3× to 1.7× the throughput of the non-MTP version.
30
+ - Agentic Intelligence: designed for tool use, reasoning, and autonomous problem-solving.
31
+
32
+ ## 2. Model Summary
33
+
34
+ | | |
35
+ | :-----------------------------------------: | :----------------------: |
36
+ | **Architecture** | Mixture-of-Experts (MoE) |
37
+ | **Total Parameters** | 48B |
38
+ | **Activated Parameters** | 3B |
39
+ | **Number of Layers** (Dense layer included) | 40 |
40
+ | **Number of Dense Layers** | 1 |
41
+ | **Attention Hidden Dimension** | 2048 |
42
+ | **MoE Hidden Dimension** (per Expert) | 768 |
43
+ | **Number of Attention Heads** | 32 |
44
+ | **Number of Experts** | 256 |
45
+ | **Selected Experts per Token** | 8 |
46
+ | **Number of Shared Experts** | 1 |
47
+ | **Vocabulary Size** | 129K |
48
+ | **Context Length** | 128K |
49
+ | **Attention Mechanism** | MLA |
50
+ | **Activation Function** | SwiGLU |
51
+ | </div> | |
52
+
53
+
54
+ ## 3. Evaluation Results
55
+
56
+ <table>
57
+ <thead>
58
+ <tr>
59
+ <th align="center">Benchmark</th>
60
+ <th align="center"><sup>JoyAI-LLM Flash</sup></th>
61
+ <th align="center"><sup>Qwen3-30B-A3B-Instuct-2507</sup></th>
62
+ <th align="center"><sup>GLM-4.7-Flash<br>(Non-thinking)</sup></th>
63
+ </tr>
64
+ </thead>
65
+ <tbody>
66
+
67
+
68
+ <tr>
69
+ <td align="center" colspan=8><strong>Knowledge &amp; Alignment</strong></td>
70
+ </tr>
71
+ <tr>
72
+ <td align="center" style="vertical-align: middle">MMLU</td>
73
+ <td align="center" style="vertical-align: middle"><strong>89.50</strong></td>
74
+ <td align="center" style="vertical-align: middle">86.87</td>
75
+ <td align="center" style="vertical-align: middle">80.53</td>
76
+ </tr>
77
+ <tr>
78
+ <td align="center" style="vertical-align: middle">MMLU-Pro</td>
79
+ <td align="center" style="vertical-align: middle"><strong>81.02</strong></td>
80
+ <td align="center" style="vertical-align: middle">73.88</td>
81
+ <td align="center" style="vertical-align: middle">63.62</td>
82
+ </tr>
83
+ <tr>
84
+ <td align="center" style="vertical-align: middle">CMMLU</td>
85
+ <td align="center" style="vertical-align: middle"><strong>87.03</strong></td>
86
+ <td align="center" style="vertical-align: middle">85.88</td>
87
+ <td align="center" style="vertical-align: middle">75.85</td>
88
+ </tr>
89
+ <tr>
90
+ <td align="center" style="vertical-align: middle">GPQA-Diamond</td>
91
+ <td align="center" style="vertical-align: middle"><strong>74.43</strong></td>
92
+ <td align="center" style="vertical-align: middle">68.69</td>
93
+ <td align="center" style="vertical-align: middle">39.90</td>
94
+ </tr>
95
+ <tr>
96
+ <td align="center" style="vertical-align: middle">SuperGPQA</td>
97
+ <td align="center" style="vertical-align: middle"><strong>55.00</strong></td>
98
+ <td align="center" style="vertical-align: middle">52.00</td>
99
+ <td align="center" style="vertical-align: middle">32.00</td>
100
+ </tr>
101
+ <tr>
102
+ <td align="center" style="vertical-align: middle">LiveBench</td>
103
+ <td align="center" style="vertical-align: middle"><strong>72.90</strong></td>
104
+ <td align="center" style="vertical-align: middle">59.70</td>
105
+ <td align="center" style="vertical-align: middle">43.10</td>
106
+ </tr>
107
+ <tr>
108
+ <td align="center" style="vertical-align: middle">IFEval</td>
109
+ <td align="center" style="vertical-align: middle"><strong>86.69</strong></td>
110
+ <td align="center" style="vertical-align: middle">83.18</td>
111
+ <td align="center" style="vertical-align: middle">82.44</td>
112
+ </tr>
113
+ <tr>
114
+ <td align="center" style="vertical-align: middle">AlignBench</td>
115
+ <td align="center" style="vertical-align: middle"><strong>8.24</strong></td>
116
+ <td align="center" style="vertical-align: middle">8.07</td>
117
+ <td align="center" style="vertical-align: middle">6.85</td>
118
+ </tr>
119
+ <tr>
120
+ <td align="center" style="vertical-align: middle">HellaSwag</td>
121
+ <td align="center" style="vertical-align: middle"><strong>91.79</strong></td>
122
+ <td align="center" style="vertical-align: middle">89.90</td>
123
+ <td align="center" style="vertical-align: middle">60.84</td>
124
+ </tr>
125
+
126
+ <tr>
127
+ <td align="center" colspan=8><strong>Coding</strong></td>
128
+ </tr>
129
+ <tr>
130
+ <td align="center" style="vertical-align: middle">HumanEval</td>
131
+ <td align="center" style="vertical-align: middle"><strong>96.34</strong></td>
132
+ <td align="center" style="vertical-align: middle">95.12</td>
133
+ <td align="center" style="vertical-align: middle">74.39</td>
134
+ </tr>
135
+ <tr>
136
+ <td align="center" style="vertical-align: middle">LiveCodeBench</td>
137
+ <td align="center" style="vertical-align: middle"><strong>65.60</strong></td>
138
+ <td align="center" style="vertical-align: middle">39.71</td>
139
+ <td align="center" style="vertical-align: middle">27.43</td>
140
+ </tr>
141
+ <tr>
142
+ <td align="center" style="vertical-align: middle">SciCode</td>
143
+ <td align="center" style="vertical-align: middle"><strong>3.08/22.92</strong></td>
144
+ <td align="center" style="vertical-align: middle"><strong>3.08/22.92</strong></td>
145
+ <td align="center" style="vertical-align: middle">3.08/15.11</td>
146
+ </tr>
147
+ <tr>
148
+ <td align="center" colspan=8><strong>Mathematics</strong></td>
149
+ </tr>
150
+ <tr>
151
+ <td align="center" style="vertical-align: middle">GSM8K</td>
152
+ <td align="center" style="vertical-align: middle"><strong>95.83</strong></td>
153
+ <td align="center" style="vertical-align: middle">79.83</td>
154
+ <td align="center" style="vertical-align: middle">81.88</td>
155
+ </tr>
156
+ <tr>
157
+ <td align="center" style="vertical-align: middle">AIME2025</td>
158
+ <td align="center" style="vertical-align: middle"><strong>65.83</strong></td>
159
+ <td align="center" style="vertical-align: middle">62.08</td>
160
+ <td align="center" style="vertical-align: middle">24.17</td>
161
+ </tr>
162
+ <tr>
163
+ <td align="center" style="vertical-align: middle">MATH 500</td>
164
+ <td align="center" style="vertical-align: middle"><strong>97.10</strong></td>
165
+ <td align="center" style="vertical-align: middle">89.80</td>
166
+ <td align="center" style="vertical-align: middle">90.90</td>
167
+ </tr>
168
+
169
+ <tr>
170
+ <td align="center" colspan=8><strong>Agentic</strong></td>
171
+ </tr>
172
+ <tr>
173
+ <td align="center" style="vertical-align: middle">SWE-bench Verified</td>
174
+ <td align="center" style="vertical-align: middle"><strong>60.60</strong></td>
175
+ <td align="center" style="vertical-align: middle">24.44</td>
176
+ <td align="center" style="vertical-align: middle">51.60</td>
177
+ </tr>
178
+ <tr>
179
+ <td align="center" style="vertical-align: middle">Tau2-Retail</td>
180
+ <td align="center" style="vertical-align: middle"><strong>67.55</strong></td>
181
+ <td align="center" style="vertical-align: middle">53.51</td>
182
+ <td align="center" style="vertical-align: middle">62.28</td>
183
+ </tr>
184
+ <tr>
185
+ <td align="center" style="vertical-align: middle">Tau2-Airline</td>
186
+ <td align="center" style="vertical-align: middle"><strong>54.00</strong></td>
187
+ <td align="center" style="vertical-align: middle">32.00</td>
188
+ <td align="center" style="vertical-align: middle">52.00</td>
189
+ </tr>
190
+ <tr>
191
+ <td align="center" style="vertical-align: middle">Tau2-Telecom</td>
192
+ <td align="center" style="vertical-align: middle">79.83</td>
193
+ <td align="center" style="vertical-align: middle">4.39</td>
194
+ <td align="center" style="vertical-align: middle"><strong>88.60</strong></td>
195
+ </tr>
196
+
197
+ <tr>
198
+ <td align="center" colspan=8><strong>Long Context</strong></td>
199
+ </tr>
200
+ <tr>
201
+ <td align="center" style="vertical-align: middle">RULER</td>
202
+ <td align="center" style="vertical-align: middle"><strong>95.60</strong></td>
203
+ <td align="center" style="vertical-align: middle">89.66</td>
204
+ <td align="center" style="vertical-align: middle">56.12</td>
205
+ </tr>
206
+ </tbody>
207
+ </table>
208
+
209
+
210
+ ## 4. Deployment
211
+
212
+ > [!Note]
213
+ > You can access JoyAI-LLM Flash API on https://docs.jdcloud.com/cn/jdaip/chat and we provide OpenAI/Anthropic-compatible API for you.
214
+ > Currently, JoyAI-LLM Flash is recommended to run on the following inference engines:
215
+
216
+ * vLLM
217
+ * SGLang
218
+
219
+ The minimum version requirement for `transformers` is `4.57.1`.
220
+
221
+ Deployment examples can be found in the [Model Deployment Guide](docs/deploy_guidance.md).
222
+
223
+
224
+
225
+ ## 5. Model Usage
226
+
227
+ The usage demos below demonstrate how to call our official API.
228
+
229
+ For third-party APIs deployed with vLLM or SGLang, please note that:
230
+
231
+ > [!Note] Recommended sampling parameters: `temperature=0.6`, `top_p=1.0`
232
+
233
+ ### Chat Completion
234
+
235
+ This is a simple chat completion script which shows how to call JoyAI-Flash API.
236
+
237
+ ```python
238
+ from openai import OpenAI
239
+
240
+ client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")
241
+
242
+
243
+ def simple_chat(client: OpenAI):
244
+ messages = [
245
+ {
246
+ "role": "user",
247
+ "content": [
248
+ {
249
+ "type": "text",
250
+ "text": "which one is bigger, 9.11 or 9.9? think carefully.",
251
+ }
252
+ ],
253
+ },
254
+ ]
255
+ model_name = client.models.list().data[0].id
256
+ response = client.chat.completions.create(
257
+ model=model_name, messages=messages, stream=False, max_tokens=4096
258
+ )
259
+ print(f"response: {response.choices[0].message.content}")
260
+
261
+
262
+ if __name__ == "__main__":
263
+ simple_chat(client)
264
+ ```
265
+
266
+
267
+ ### Tool call Completion
268
+
269
+ This is a simple toll call completion script which shows how to call JoyAI-Flash API.
270
+
271
+ ```python
272
+ import json
273
+
274
+ from openai import OpenAI
275
+
276
+ client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")
277
+
278
+
279
+ def my_calculator(expression: str) -> str:
280
+ return str(eval(expression))
281
+
282
+
283
+ def rewrite(expression: str) -> str:
284
+ return str(expression)
285
+
286
+
287
+ def simple_tool_call(client: OpenAI):
288
+ messages = [
289
+ {
290
+ "role": "user",
291
+ "content": [
292
+ {
293
+ "type": "text",
294
+ "text": "use my functions to compute the results for the equations: 6+1",
295
+ },
296
+ ],
297
+ },
298
+ ]
299
+ tools = [
300
+ {
301
+ "type": "function",
302
+ "function": {
303
+ "name": "my_calculator",
304
+ "description": "A calculator that can evaluate a mathematical equation and compute its results.",
305
+ "parameters": {
306
+ "type": "object",
307
+ "properties": {
308
+ "expression": {
309
+ "type": "string",
310
+ "description": "The mathematical expression to evaluate.",
311
+ },
312
+ },
313
+ "required": ["expression"],
314
+ },
315
+ },
316
+ },
317
+ {
318
+ "type": "function",
319
+ "function": {
320
+ "name": "rewrite",
321
+ "description": "Rewrite a given text for improved clarity",
322
+ "parameters": {
323
+ "type": "object",
324
+ "properties": {
325
+ "text": {
326
+ "type": "string",
327
+ "description": "The input text to rewrite",
328
+ }
329
+ },
330
+ },
331
+ },
332
+ },
333
+ ]
334
+ model_name = client.models.list().data[0].id
335
+ response = client.chat.completions.create(
336
+ model=model_name,
337
+ messages=messages,
338
+ temperature=1.0,
339
+ max_tokens=1024,
340
+ tools=tools,
341
+ tool_choice="auto",
342
+ )
343
+ tool_calls = response.choices[0].message.tool_calls
344
+
345
+ results = []
346
+ for tool_call in tool_calls:
347
+ function_name = tool_call.function.name
348
+ function_args = tool_call.function.arguments
349
+ if function_name == "my_calculator":
350
+ result = my_calculator(**json.loads(function_args))
351
+ results.append(result)
352
+ messages.append({"role": "assistant", "tool_calls": tool_calls})
353
+ for tool_call, result in zip(tool_calls, results):
354
+ messages.append(
355
+ {
356
+ "role": "tool",
357
+ "tool_call_id": tool_call.id,
358
+ "name": tool_call.function.name,
359
+ "content": result,
360
+ }
361
+ )
362
+ response = client.chat.completions.create(
363
+ model=model_name,
364
+ messages=messages,
365
+ temperature=1.0,
366
+ max_tokens=1024,
367
+ )
368
+ print(response.choices[0].message.content)
369
+
370
+
371
+ if __name__ == "__main__":
372
+ simple_tool_call(client)
373
+
374
+ ```
375
+
376
+ ---
377
+
378
+ ## 6. License
379
+
380
+ Both the code repository and the model weights are released under the [Modified MIT License](LICENSE).
chat_template.jinja ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- macro render_extra_keys(json_dict, handled_keys) -%}
2
+ {%- if json_dict is mapping -%}
3
+ {%- for json_key in json_dict if json_key not in handled_keys -%}
4
+ {%- if json_dict[json_key] is mapping or (json_dict[json_key] is sequence and json_dict[json_key] is not string) -%}
5
+ {{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson | safe) ~ '</' ~ json_key ~ '>' -}}
6
+ {%- else -%}
7
+ {{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' -}}
8
+ {%- endif -%}
9
+ {%- endfor -%}
10
+ {%- endif -%}
11
+ {%- endmacro -%}
12
+
13
+ {%- if not add_generation_prompt is defined -%}{%- set add_generation_prompt = false -%}{%- endif -%}
14
+
15
+ {%- set ns = namespace(system_prompt='', is_first_sp=true, is_last_user=false) -%}
16
+ {%- set default_system = "You are JoyAI , a large language model trained by JD(京东)that can interact with a computer to solve tasks. Answer as concisely as possible." -%}
17
+ {%- set ns.system_prompt = default_system -%}
18
+
19
+ {%- for message in messages -%}
20
+ {%- if message['role'] == 'system' -%}
21
+ {%- if ns.is_first_sp -%}
22
+ {%- set ns.system_prompt = message['content'] -%}
23
+ {%- set ns.is_first_sp = false -%}
24
+ {%- else -%}
25
+ {%- set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] -%}
26
+ {%- endif -%}
27
+ {%- endif -%}
28
+ {%- endfor -%}
29
+
30
+ {{- bos_token -}}{{- ns.system_prompt -}}
31
+ {%- if tools is iterable and tools | length > 0 -%}
32
+ {{- "\n\n# Tools\n\nYou have access to the following functions:\n\n" }}
33
+ {{- "<tools>" }}
34
+ {%- for tool in tools %}
35
+ {%- if tool.function is defined %}
36
+ {%- set tool = tool.function %}
37
+ {%- endif %}
38
+ {{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
39
+ {%- if tool.description is defined %}
40
+ {{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
41
+ {%- endif %}
42
+ {{- '\n<parameters>' }}
43
+ {%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
44
+ {%- for param_name, param_fields in tool.parameters.properties|items %}
45
+ {{- '\n<parameter>' }}
46
+ {{- '\n<name>' ~ param_name ~ '</name>' }}
47
+ {%- if param_fields.type is defined %}
48
+ {{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
49
+ {%- endif %}
50
+ {%- if param_fields.description is defined %}
51
+ {{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
52
+ {%- endif %}
53
+ {%- set handled_keys = ['name', 'type', 'description'] %}
54
+ {{- render_extra_keys(param_fields, handled_keys) }}
55
+ {{- '\n</parameter>' }}
56
+ {%- endfor %}
57
+ {%- endif %}
58
+ {% set handled_keys = ['type', 'properties'] %}
59
+ {{- render_extra_keys(tool.parameters, handled_keys) }}
60
+ {{- '\n</parameters>' }}
61
+ {%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
62
+ {{- render_extra_keys(tool, handled_keys) }}
63
+ {{- '\n</function>' }}
64
+ {%- endfor %}
65
+ {{- "\n</tools>" }}
66
+ {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
67
+ {%- endif %}
68
+ {%- for message in messages -%}
69
+ {%- if message['role'] == 'user' -%}
70
+ {%- set ns.is_last_user = true -%}
71
+ {{- '<|User|>' + message['content'] -}}
72
+ {%- elif message['role'] == 'assistant' -%}
73
+ {%- if ns.is_last_user -%}
74
+ {{ '<|Assistant|>' }}
75
+ {%- endif -%}
76
+ {%- set ns.is_last_user = false -%}
77
+ {%- set content = message.get('content') | default('', true) -%}
78
+ {{ '<|end_of_thought|>' + content }}
79
+ {%- if message['tool_calls'] is defined and message['tool_calls'] is not none -%}
80
+ {%- for tool in message['tool_calls'] -%}
81
+ {%- if tool.function is defined %}{% set tool = tool.function %}{% endif -%}
82
+ {{- '\n<tool_call>\n<function=' + tool.name + '>\n' -}}
83
+ {%- if tool.arguments is defined -%}
84
+ {%- if tool.arguments is string -%}{%- set args_data = tool.arguments | from_json -%}{%- else -%}{%- set args_data = tool.arguments -%}{%- endif -%}
85
+ {%- for args_name, args_value in args_data.items() -%}
86
+ {{- '<parameter=' + args_name + '>\n' -}}
87
+ {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string -%}
88
+ {{- args_value -}}{{- '\n</parameter>\n' -}}
89
+ {%- endfor -%}
90
+ {%- endif -%}
91
+ {{- '</function>\n</tool_call>' -}}
92
+ {%- endfor -%}
93
+ {%- endif -%}
94
+ {{ '<|end▁of▁sentence|>' }}
95
+ {%- elif message['role'] == 'tool' -%}
96
+ {%- set ns.is_last_user = true -%}
97
+ {{ '\n<tool_response>\n' + message['content'] + '\n</tool_response>' }}
98
+ {%- endif -%}
99
+ {%- endfor -%}
100
+
101
+ {%- if add_generation_prompt -%}
102
+ {{ '<|Assistant|>' }}{{ '<|end_of_thought|>' }}
103
+ {%- endif -%}
config.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "DeepseekV3ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_deepseek.DeepseekV3Config",
9
+ "AutoModel": "modeling_deepseek.DeepseekV3Model",
10
+ "AutoModelForCausalLM": "modeling_deepseek.DeepseekV3ForCausalLM"
11
+ },
12
+ "bos_token_id": 0,
13
+ "eos_token_id": 1,
14
+ "ep_size": 1,
15
+ "first_k_dense_replace": 1,
16
+ "hidden_act": "silu",
17
+ "hidden_size": 2048,
18
+ "initializer_range": 0.02,
19
+ "intermediate_size": 7168,
20
+ "kv_lora_rank": 512,
21
+ "max_position_embeddings": 131072,
22
+ "model_type": "joyai_llm_flash",
23
+ "moe_intermediate_size": 768,
24
+ "moe_layer_freq": 1,
25
+ "n_group": 1,
26
+ "n_routed_experts": 256,
27
+ "n_shared_experts": 1,
28
+ "norm_topk_prob": true,
29
+ "num_attention_heads": 32,
30
+ "num_experts_per_tok": 8,
31
+ "num_hidden_layers": 40,
32
+ "num_key_value_heads": 32,
33
+ "num_nextn_predict_layers": 1,
34
+ "q_lora_rank": 1536,
35
+ "qk_nope_head_dim": 128,
36
+ "qk_rope_head_dim": 64,
37
+ "rms_norm_eps": 1e-06,
38
+ "rope_theta": 32000000,
39
+ "routed_scaling_factor": 2.5,
40
+ "scoring_func": "sigmoid",
41
+ "tie_word_embeddings": false,
42
+ "topk_group": 1,
43
+ "topk_method": "noaux_tc",
44
+ "torch_dtype": "bfloat16",
45
+ "transformers_version": "4.44.2",
46
+ "use_cache": true,
47
+ "v_head_dim": 128,
48
+ "vocab_size": 129280
49
+ }
configuration.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"framework":"Pytorch","task":"text-generation"}
configuration_deepseek.py ADDED
@@ -0,0 +1,247 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2025 bzantium and the HuggingFace Inc. team. All rights reserved.
3
+ #
4
+ # This code is based on the DeepSeekV3 implementations from the DeepSeek AI team. (https://huggingface.co/deepseek-ai/DeepSeek-V3)
5
+
6
+ # Licensed under the Apache License, Version 2.0 (the "License");
7
+ # you may not use this file except in compliance with the License.
8
+ # You may obtain a copy of the License at
9
+ #
10
+ # http://www.apache.org/licenses/LICENSE-2.0
11
+ #
12
+ # Unless required by applicable law or agreed to in writing, software
13
+ # distributed under the License is distributed on an "AS IS" BASIS,
14
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15
+ # See the License for the specific language governing permissions and
16
+ # limitations under the License.
17
+ """DeepSeekV3 model configuration"""
18
+
19
+ from transformers.configuration_utils import PretrainedConfig
20
+ from transformers.modeling_rope_utils import rope_config_validation
21
+
22
+
23
+ DEEPSEEK_PRETRAINED_CONFIG_ARCHIVE_MAP = {}
24
+
25
+
26
+ class DeepseekV3Config(PretrainedConfig):
27
+ r"""
28
+ This is the configuration class to store the configuration of a [`DeepseekV3Model`]. It is used to instantiate an DeepSeek
29
+ model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
30
+ defaults will yield a similar configuration to that of the DeepSeek-V3.
31
+ e.g. [bzantium/tiny-deepseek-v3](https://huggingface.co/bzantium/tiny-deepseek-v3)
32
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
33
+ documentation from [`PretrainedConfig`] for more information.
34
+
35
+
36
+ Args:
37
+ vocab_size (`int`, *optional*, defaults to 129280):
38
+ Vocabulary size of the Deep model. Defines the number of different tokens that can be represented by the
39
+ `inputs_ids` passed when calling [`DeepseekV3Model`]
40
+ hidden_size (`int`, *optional*, defaults to 7168):
41
+ Dimension of the hidden representations.
42
+ intermediate_size (`int`, *optional*, defaults to 18432):
43
+ Dimension of the MLP representations.
44
+ moe_intermediate_size (`int`, *optional*, defaults to 2048):
45
+ Dimension of the MoE representations.
46
+ num_hidden_layers (`int`, *optional*, defaults to 61):
47
+ Number of hidden layers in the Transformer decoder.
48
+ num_attention_heads (`int`, *optional*, defaults to 128):
49
+ Number of attention heads for each attention layer in the Transformer decoder.
50
+ num_key_value_heads (`int`, *optional*, defaults to 128):
51
+ This is the number of key_value heads that should be used to implement Grouped Query Attention. If
52
+ `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
53
+ `num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
54
+ converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
55
+ by meanpooling all the original heads within that group. For more details checkout [this
56
+ paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
57
+ `num_attention_heads`.
58
+ n_shared_experts (`int`, *optional*, defaults to 1):
59
+ Number of shared experts.
60
+ n_routed_experts (`int`, *optional*, defaults to 256):
61
+ Number of routed experts.
62
+ routed_scaling_factor (`float`, *optional*, defaults to 2.5):
63
+ Scaling factor or routed experts.
64
+ kv_lora_rank (`int`, *optional*, defaults to 512):
65
+ Rank of the LoRA matrices for key and value projections.
66
+ q_lora_rank (`int`, *optional*, defaults to 1536):
67
+ Rank of the LoRA matrices for query projections.
68
+ qk_rope_head_dim (`int`, *optional*, defaults to 64):
69
+ Dimension of the query/key heads that use rotary position embeddings.
70
+ v_head_dim (`int`, *optional*, defaults to 128):
71
+ Dimension of the value heads.
72
+ qk_nope_head_dim (`int`, *optional*, defaults to 128):
73
+ Dimension of the query/key heads that don't use rotary position embeddings.
74
+ n_group (`int`, *optional*, defaults to 8):
75
+ Number of groups for routed experts.
76
+ topk_group (`int`, *optional*, defaults to 4):
77
+ Number of selected groups for each token(for each token, ensuring the selected experts is only within `topk_group` groups).
78
+ num_experts_per_tok (`int`, *optional*, defaults to 8):
79
+ Number of selected experts, None means dense model.
80
+ first_k_dense_replace (`int`, *optional*, defaults to 3):
81
+ Number of dense layers in shallow layers(embed->dense->dense->...->dense->moe->moe...->lm_head).
82
+ \--k dense layers--/
83
+ norm_topk_prob (`bool`, *optional*, defaults to `True`):
84
+ Whether to normalize the weights of the routed experts.
85
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
86
+ The non-linear activation function (function or string) in the decoder.
87
+ max_position_embeddings (`int`, *optional*, defaults to 4096):
88
+ The maximum sequence length that this model might ever be used with.
89
+ initializer_range (`float`, *optional*, defaults to 0.02):
90
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
91
+ rms_norm_eps (`float`, *optional*, defaults to 1e-06):
92
+ The epsilon used by the rms normalization layers.
93
+ use_cache (`bool`, *optional*, defaults to `True`):
94
+ Whether or not the model should return the last key/values attentions (not used by all models). Only
95
+ relevant if `config.is_decoder=True`.
96
+ pad_token_id (`int`, *optional*):
97
+ Padding token id.
98
+ bos_token_id (`int`, *optional*, defaults to 0):
99
+ Beginning of stream token id.
100
+ eos_token_id (`int`, *optional*, defaults to 1):
101
+ End of stream token id.
102
+ pretraining_tp (`int`, *optional*, defaults to 1):
103
+ Experimental feature. Tensor parallelism rank used during pretraining. Please refer to [this
104
+ document](https://huggingface.co/docs/transformers/parallelism) to understand more about it. This value is
105
+ necessary to ensure exact reproducibility of the pretraining results. Please refer to [this
106
+ issue](https://github.com/pytorch/pytorch/issues/76232).
107
+ tie_word_embeddings (`bool`, *optional*, defaults to `False`):
108
+ Whether to tie weight embeddings
109
+ rope_theta (`float`, *optional*, defaults to 10000.0):
110
+ The base period of the RoPE embeddings.
111
+ rope_scaling (`Dict`, *optional*):
112
+ Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
113
+ strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
114
+ `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
115
+ `max_position_embeddings` to the expected new maximum.
116
+ rope_interleave (`bool`, *optional*, defaults to `True`):
117
+ Whether to interleave the rotary position embeddings.
118
+ attention_bias (`bool`, defaults to `False`, *optional*, defaults to `False`):
119
+ Whether to use a bias in the query, key, value and output projection layers during self-attention.
120
+ attention_dropout (`float`, *optional*, defaults to 0.0):
121
+ The dropout ratio for the attention probabilities.
122
+
123
+ ```python
124
+ >>> from transformers import DeepseekV3Model, DeepseekV3Config
125
+
126
+ >>> # Initializing a Deepseek-V3 style configuration
127
+ >>> configuration = DeepseekV3Config()
128
+
129
+ >>> # Accessing the model configuration
130
+ >>> configuration = model.config
131
+ ```"""
132
+
133
+ model_type = "deepseek_v3"
134
+ keys_to_ignore_at_inference = ["past_key_values"]
135
+ base_model_tp_plan = { # TODO: only replicate attention layers when > first_k_dense_replace
136
+ "layers.*.mlp.experts.*.gate_proj": "local_colwise",
137
+ "layers.*.mlp.experts.*.up_proj": "local_colwise",
138
+ "layers.*.mlp.experts.*.down_proj": "local_rowwise",
139
+ "layers.*.mlp.experts.*": "local", # each expert is wrapped in a module list
140
+ "layers.*.mlp.shared_experts.gate_proj": "local_colwise",
141
+ "layers.*.mlp.shared_experts.up_proj": "local_colwise",
142
+ "layers.*.mlp.shared_experts.down_proj": "local_rowwise",
143
+ "layers.*.mlp.shared_experts": "local",
144
+ "layers.*.mlp.gate_proj": "local_colwise",
145
+ "layers.*.mlp.up_proj": "local_colwise",
146
+ "layers.*.mlp.down_proj": "local_rowwise",
147
+ "layers.*.mlp": "gather", # This is the only moment where results are gathered
148
+ }
149
+ base_model_pp_plan = {
150
+ "embed_tokens": (["input_ids"], ["inputs_embeds"]),
151
+ "layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
152
+ "norm": (["hidden_states"], ["hidden_states"]),
153
+ }
154
+
155
+ def __init__(
156
+ self,
157
+ vocab_size=129280,
158
+ hidden_size=7168,
159
+ intermediate_size=18432,
160
+ moe_intermediate_size=2048,
161
+ num_hidden_layers=61,
162
+ num_attention_heads=128,
163
+ num_key_value_heads=128,
164
+ n_shared_experts=1,
165
+ n_routed_experts=256,
166
+ routed_scaling_factor=2.5,
167
+ kv_lora_rank=512,
168
+ q_lora_rank=1536,
169
+ qk_rope_head_dim=64,
170
+ v_head_dim=128,
171
+ qk_nope_head_dim=128,
172
+ n_group=8,
173
+ topk_group=4,
174
+ num_experts_per_tok=8,
175
+ first_k_dense_replace=3,
176
+ norm_topk_prob=True,
177
+ hidden_act="silu",
178
+ max_position_embeddings=4096,
179
+ initializer_range=0.02,
180
+ rms_norm_eps=1e-6,
181
+ use_cache=True,
182
+ pad_token_id=None,
183
+ bos_token_id=0,
184
+ eos_token_id=1,
185
+ pretraining_tp=1,
186
+ tie_word_embeddings=False,
187
+ rope_theta=10000.0,
188
+ rope_scaling=None,
189
+ rope_interleave=True,
190
+ attention_bias=False,
191
+ attention_dropout=0.0,
192
+ **kwargs,
193
+ ):
194
+ self.vocab_size = vocab_size
195
+ self.max_position_embeddings = max_position_embeddings
196
+ self.hidden_size = hidden_size
197
+ self.intermediate_size = intermediate_size
198
+ self.moe_intermediate_size = moe_intermediate_size
199
+ self.num_hidden_layers = num_hidden_layers
200
+ self.num_attention_heads = num_attention_heads
201
+ self.n_shared_experts = n_shared_experts
202
+ self.n_routed_experts = n_routed_experts
203
+ self.routed_scaling_factor = routed_scaling_factor
204
+ self.kv_lora_rank = kv_lora_rank
205
+ self.q_lora_rank = q_lora_rank
206
+ self.qk_rope_head_dim = qk_rope_head_dim
207
+ self.v_head_dim = v_head_dim
208
+ self.qk_nope_head_dim = qk_nope_head_dim
209
+ self.qk_head_dim = qk_nope_head_dim + qk_rope_head_dim
210
+ self.head_dim = qk_rope_head_dim
211
+ self.n_group = n_group
212
+ self.topk_group = topk_group
213
+ self.num_experts_per_tok = num_experts_per_tok
214
+ self.first_k_dense_replace = first_k_dense_replace
215
+ self.norm_topk_prob = norm_topk_prob
216
+ self.rope_interleave = rope_interleave
217
+
218
+ # for backward compatibility
219
+ if num_key_value_heads is None:
220
+ num_key_value_heads = num_attention_heads
221
+
222
+ self.num_key_value_heads = num_key_value_heads
223
+ self.hidden_act = hidden_act
224
+ self.initializer_range = initializer_range
225
+ self.rms_norm_eps = rms_norm_eps
226
+ self.pretraining_tp = pretraining_tp
227
+ self.use_cache = use_cache
228
+ self.rope_theta = rope_theta
229
+ self.rope_scaling = rope_scaling
230
+ self.attention_bias = attention_bias
231
+ self.attention_dropout = attention_dropout
232
+ # Validate the correctness of rotary position embeddings parameters
233
+ # BC: if there is a 'type' field, copy it it to 'rope_type'.
234
+ if self.rope_scaling is not None and "type" in self.rope_scaling:
235
+ self.rope_scaling["rope_type"] = self.rope_scaling["type"]
236
+ rope_config_validation(self)
237
+
238
+ super().__init__(
239
+ pad_token_id=pad_token_id,
240
+ bos_token_id=bos_token_id,
241
+ eos_token_id=eos_token_id,
242
+ tie_word_embeddings=tie_word_embeddings,
243
+ **kwargs,
244
+ )
245
+
246
+
247
+ __all__ = ["DeepseekV3Config"]
docs/deploy_guidance.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # JoyAI-LLM Flash Deployment Guide
2
+
3
+ > [!Note]
4
+ > This guide offers a selection of deployment command examples for JoyAI-LLM Flash, which may not be the optimal configuration. Given the rapid evolution of inference engines, we recommend referring to their official documentation for the latest updates to ensure peak performance.
5
+
6
+ > Support for JoyAI-LLM Flash’s dense MTP architecture is currently being integrated into vLLM and SGLang. Until these PRs are merged into a stable release, please use the nightly Docker image for access to these features.
7
+
8
+ ## vLLM Deployment
9
+
10
+ Here is the example to serve this model on a H200 single node with TP8 via vLLM:
11
+
12
+ 1. pull the Docker image.
13
+ ```bash
14
+ docker pull jdopensource/joyai-llm-vllm:v0.13.0-joyai_llm_flash
15
+ ```
16
+ 2. launch JoyAI-LLM Flash model with dense MTP.
17
+ ```bash
18
+ vllm serve ${MODEL_PATH} --tp 8 --trust-remote-code \
19
+ --tool-call-parser qwen3_coder --enable-auto-tool-choice \
20
+ --speculative-config $'{"method": "mtp", "num_speculative_tokens": 3}'
21
+ ```
22
+ **Key notes**
23
+ - `--tool-call-parser qwen3_coder`: Required for enabling tool calling
24
+
25
+ ## SGLang Deployment
26
+
27
+ Similarly, here is the example to run with TP8 on H200 in a single node via SGLang:
28
+
29
+ 1. pull the Docker image.
30
+ ```bash
31
+ docker pull jdopensource/joyai-llm-sglang:v0.5.8-joyai_llm_flash
32
+ ```
33
+ 2. launch JoyAI-LLM Flash model with dense MTP.
34
+
35
+ ```bash
36
+ python3 -m sglang.launch_server --model-path ${MODEL_PATH} --tp-size 8 --trust-remote-code \
37
+ --tool-call-parser qwen3_coder \
38
+ --speculative-algorithm EAGLE --speculative-draft-model-path ${MTP_MODEL_PATH} \
39
+ --speculative-num-steps 2 --speculative-eagle-topk 2 --speculative-num-draft-tokens 3
40
+ ```
41
+ **Key notes:**
42
+ - `--tool-call-parser qwen3_coder`: Required when enabling tool usage.
figures/joyai-logo.png ADDED

Git LFS Details

  • SHA256: 4ea9d6a20a7707ca8dc427d6dcb5db6e2489f7730d5bffea26d8db20b1c54365
  • Pointer size: 131 Bytes
  • Size of remote file: 250 kB
model-1-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:00342c45cd62e28fe183e2529ce61e2cecf0d8ea5451b2a8fe4137ae5e50e901
3
+ size 140785016
model-10-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:25566dca71af5a8ad118ecff34fe03acda6d2483b9127cd26211be2082d5d6c8
3
+ size 2479205264
model-11-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:106ca52367ca7d80b9f99553e3ad01b820ec12f28be1d2fc57233f2ce33a5199
3
+ size 2479206048
model-12-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bc1a396967e8b7cfafe9399c031a2e869319b4bf40750a58f91b81037a36f0f2
3
+ size 2479206048
model-13-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b813cd997285699eb10401e7b80c87773900b6c9ca48a305f228824dde553fe3
3
+ size 2479206048
model-14-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b78f08465829cee2cf4d9a06912e33306c76a1251ac0f6637a2a505bef3376f6
3
+ size 2479206048
model-15-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:415653c237c5189bcf725a587b7e7db5c15c858d9748c11fa9ee9a97c77ebf3e
3
+ size 2479206048
model-16-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:edc2ec545b27884512149627d4c68c6e80a47429ab3f125e14f889aeb5c4555c
3
+ size 2479206048
model-17-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:849209aeddc1e424dfd58c9b7bc85bb7fc26ecd79e35d0c3e0c1246ce7ea6cd8
3
+ size 2479206048
model-18-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37a1c9d67e32905eee90a59028f843b3e2df75b7fdf310347467525ba8af0778
3
+ size 2479206048
model-19-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71940345d604e462c154b15eba80ce6013a9b47daaae628133c7a7b561ec0947
3
+ size 2479206048
model-2-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aab17dcbeecf180ed13a2cd84fa619eb9b84dc05075252c464154ada4078a771
3
+ size 2479205264
model-20-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6d562b929014355f66d821e31b8b6c3a69c298e464a6809bb5d84dbad34be49d
3
+ size 2479206048
model-21-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:36db259d412d9278343b91f5c0844c8a3c7992ceee11ebfe562494fb3934a012
3
+ size 2479206048
model-22-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a63e252bf2374910cf4c8a26d1c2ffe0ea88348dae23fed3cbe99c82443213b
3
+ size 2479206048
model-23-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:490edf4128ec42e523730148c73c29445b8eea7a2d81d2e3296bc957b5d5dad7
3
+ size 2479206048
model-24-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c82fda4cc8b9570898f355b1ca8f640262e269710d0efd914d387825b82e90f4
3
+ size 2479206048
model-25-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:371774aa4bdfbefc9c62e4dd7a289e2cc15fe8ac8dd01f193bc3951c395149d1
3
+ size 2479206048
model-26-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba82a2dee480d47bf4ec07bf15078affa6350b0d4414ed0152c9b6a34fa4cfa9
3
+ size 2479206048
model-27-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0061eca323f71053806bdd9d78e098fc28e8442b3cd27581407f56b2507e3a80
3
+ size 2479206048
model-28-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e81ab12d6b7ab57a024daba11ccd75eaf4b3579b68aa7fc5e043a7c216df845a
3
+ size 2479206048
model-29-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5fc269095eba7b9ffdbd51986cc5d805d73f32333b759a502f09ff55e797789a
3
+ size 2479206048
model-3-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:734577bcc726ead7e0a9e8cad31edddb3d4ee2cc34fa5acecd37daea8332fc58
3
+ size 2479205264
model-30-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:423416965094dc7d45422a5ca9d285cab7b12836bddd7a4c4f704ce996da3a00
3
+ size 2479206048
model-31-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e405c1eebce9eaa75db83d8db6a108bd8d28b5e1e76903790dd20ae13297999e
3
+ size 2479206048
model-32-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f1b25c9b64915ba066ab35a007928d48ed3ae75fabbcc45a269b3638268c999
3
+ size 2479206048
model-33-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f927f6e3246b676dc31fb4f227de92073ee54ea790042f7077d3d5ea5732d90f
3
+ size 2479206048
model-34-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3582ea2db95a41584f8ceb30209fa8415e7fd254dc43be695f9b67dd28931ef2
3
+ size 2479206048
model-35-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:feeffd5917defddd4543b8747bd33da9ee2cabfa3ff57634ebf1718e9ed46563
3
+ size 2479206048
model-36-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:526404ba27ab66b8dcc455e1ca2234fab4843a890d7f6513e37eeebbfe2aaaf7
3
+ size 2479206048
model-37-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d8a43b4d41be94f6f691c254f79d8994611e9229fedd23dfcd4fcc39e0853de
3
+ size 2479206048
model-38-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51467503269ccea044ec96c7f94aff0efa523d586a8736dda70414fca29c5a03
3
+ size 2479206048
model-39-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:deb76b58136b1144e79b0de587d991aed8c691fa4efcf1e54d118ec9b489c22d
3
+ size 2479206048
model-4-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb292fd660dfc54cf9afe89938d431bf58d8634576c8cc52bc874d4b68fadbd9
3
+ size 2479205264
model-40-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b09ac0b99a72696162f673ae20ad74b8ae5231fdf04f663d84c4ba981e83cf8
3
+ size 2479206048
model-5-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9c7bc3ad5832be99e39cdbfdc3f843d5c0d6edb9de0c0d8d34524d049b1d945
3
+ size 2479205264
model-6-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eccfb66a683dd836293360c8c8441c466209708fd0220b04f7511814d75aed7c
3
+ size 2479205264
model-7-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:32cfe5bfb346384efb9531263d540f85d5040d2359c492f10f8a59bc5944bba4
3
+ size 2479205264
model-8-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:42c68787a8622f322c03190dda75801e30f65aa26997df147d9f7dec4fd264b4
3
+ size 2479205264
model-9-of-40.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1bf720fd443166d2e2a2f1a524035959febadfa079c7971c5a777642183008d6
3
+ size 2479205264
model-non-layer.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd760d732c11c23778a0dbf2280b62431d77d1f4ebc4f01f111cf716786981f0
3
+ size 1059066184