Commit
·
ea661d7
0
Parent(s):
Duplicate from jdopensource/JoyAI-LLM-Flash
Browse filesCo-authored-by: zhenfang wang <fangfangfang123@users.noreply.huggingface.co>
This view is limited to 50 files because it contains too many changes.
See raw diff
- .gitattributes +36 -0
- LICENSE +28 -0
- README.md +380 -0
- chat_template.jinja +103 -0
- config.json +49 -0
- configuration.json +1 -0
- configuration_deepseek.py +247 -0
- docs/deploy_guidance.md +42 -0
- figures/joyai-logo.png +3 -0
- model-1-of-40.safetensors +3 -0
- model-10-of-40.safetensors +3 -0
- model-11-of-40.safetensors +3 -0
- model-12-of-40.safetensors +3 -0
- model-13-of-40.safetensors +3 -0
- model-14-of-40.safetensors +3 -0
- model-15-of-40.safetensors +3 -0
- model-16-of-40.safetensors +3 -0
- model-17-of-40.safetensors +3 -0
- model-18-of-40.safetensors +3 -0
- model-19-of-40.safetensors +3 -0
- model-2-of-40.safetensors +3 -0
- model-20-of-40.safetensors +3 -0
- model-21-of-40.safetensors +3 -0
- model-22-of-40.safetensors +3 -0
- model-23-of-40.safetensors +3 -0
- model-24-of-40.safetensors +3 -0
- model-25-of-40.safetensors +3 -0
- model-26-of-40.safetensors +3 -0
- model-27-of-40.safetensors +3 -0
- model-28-of-40.safetensors +3 -0
- model-29-of-40.safetensors +3 -0
- model-3-of-40.safetensors +3 -0
- model-30-of-40.safetensors +3 -0
- model-31-of-40.safetensors +3 -0
- model-32-of-40.safetensors +3 -0
- model-33-of-40.safetensors +3 -0
- model-34-of-40.safetensors +3 -0
- model-35-of-40.safetensors +3 -0
- model-36-of-40.safetensors +3 -0
- model-37-of-40.safetensors +3 -0
- model-38-of-40.safetensors +3 -0
- model-39-of-40.safetensors +3 -0
- model-4-of-40.safetensors +3 -0
- model-40-of-40.safetensors +3 -0
- model-5-of-40.safetensors +3 -0
- model-6-of-40.safetensors +3 -0
- model-7-of-40.safetensors +3 -0
- model-8-of-40.safetensors +3 -0
- model-9-of-40.safetensors +3 -0
- model-non-layer.safetensors +3 -0
.gitattributes
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
figures/joyai-logo.png filter=lfs diff=lfs merge=lfs -text
|
LICENSE
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Modified MIT License
|
| 2 |
+
|
| 3 |
+
Copyright (c) 2026 JD AI
|
| 4 |
+
|
| 5 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
| 6 |
+
of this software and associated documentation files (the “Software”), to deal
|
| 7 |
+
in the Software without restriction, including without limitation the rights
|
| 8 |
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
| 9 |
+
copies of the Software, and to permit persons to whom the Software is
|
| 10 |
+
furnished to do so, subject to the following conditions:
|
| 11 |
+
|
| 12 |
+
The above copyright notice and this permission notice shall be included in all
|
| 13 |
+
copies or substantial portions of the Software.
|
| 14 |
+
|
| 15 |
+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
| 16 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
| 17 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
| 18 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
| 19 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
| 20 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
| 21 |
+
SOFTWARE.
|
| 22 |
+
|
| 23 |
+
We offer you a license similar to the MIT License. In the event that the Software
|
| 24 |
+
(or any derivative works thereof) is used for any of your commercial products or
|
| 25 |
+
services that either have more than 100 million monthly active users or generate
|
| 26 |
+
more than 20 million US dollars (or equivalent in other currencies) in monthly
|
| 27 |
+
revenue, you are required to clearly display "JoyAI-LLM" on the user interface
|
| 28 |
+
of such product or service.
|
README.md
ADDED
|
@@ -0,0 +1,380 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- zh
|
| 4 |
+
- en
|
| 5 |
+
pipeline_tag: text-generation
|
| 6 |
+
---
|
| 7 |
+
<div align="center">
|
| 8 |
+
<picture>
|
| 9 |
+
<img src="figures/joyai-logo.png" width="30%" alt="JoyAI-LLM Flash">
|
| 10 |
+
</picture>
|
| 11 |
+
</div>
|
| 12 |
+
<hr>
|
| 13 |
+
|
| 14 |
+
<div align="center" style="line-height: 1;">
|
| 15 |
+
<a href="https://huggingface.co/jdopensource" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-JD-ffc107?color=ffc107&logoColor=white"/></a>
|
| 16 |
+
<a href="https://huggingface.co/jdopensource/JoyAI-LLM-Flash/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Modified_MIT-f5de53?&color=f5de53"/></a>
|
| 17 |
+
</div>
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
## 1. Model Introduction
|
| 23 |
+
|
| 24 |
+
JoyAI-LLM Flash is a state-of-the-art medium-sized instruct language model with 3 billion activated parameters and 48 billion total parameters. JoyAI-LLM Flash was pretrained on 20 trillion text tokens using Muon optimizer, followed by large-scale supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning (RL) across diverse environments. JoyAI-LLM Flash achieves strong performance across frontier knowledge, reasoning, coding tasks and agentic capabilities.
|
| 25 |
+
|
| 26 |
+
### Key Features
|
| 27 |
+
|
| 28 |
+
- Fiber Bundle RL: Introduces fiber bundle theory into reinforcement learning, proposing a novel optimization framework, FiberPO. This method is specifically designed to handle the challenges of large-scale and heterogeneous agent training, improving stability and robustness under complex data distributions.
|
| 29 |
+
- Training-Inference Collaboration: apply Muon optimizer with dense MTP, develop novel optimization techniques to resolve instabilities while scaling up, delivering 1.3× to 1.7× the throughput of the non-MTP version.
|
| 30 |
+
- Agentic Intelligence: designed for tool use, reasoning, and autonomous problem-solving.
|
| 31 |
+
|
| 32 |
+
## 2. Model Summary
|
| 33 |
+
|
| 34 |
+
| | |
|
| 35 |
+
| :-----------------------------------------: | :----------------------: |
|
| 36 |
+
| **Architecture** | Mixture-of-Experts (MoE) |
|
| 37 |
+
| **Total Parameters** | 48B |
|
| 38 |
+
| **Activated Parameters** | 3B |
|
| 39 |
+
| **Number of Layers** (Dense layer included) | 40 |
|
| 40 |
+
| **Number of Dense Layers** | 1 |
|
| 41 |
+
| **Attention Hidden Dimension** | 2048 |
|
| 42 |
+
| **MoE Hidden Dimension** (per Expert) | 768 |
|
| 43 |
+
| **Number of Attention Heads** | 32 |
|
| 44 |
+
| **Number of Experts** | 256 |
|
| 45 |
+
| **Selected Experts per Token** | 8 |
|
| 46 |
+
| **Number of Shared Experts** | 1 |
|
| 47 |
+
| **Vocabulary Size** | 129K |
|
| 48 |
+
| **Context Length** | 128K |
|
| 49 |
+
| **Attention Mechanism** | MLA |
|
| 50 |
+
| **Activation Function** | SwiGLU |
|
| 51 |
+
| </div> | |
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
## 3. Evaluation Results
|
| 55 |
+
|
| 56 |
+
<table>
|
| 57 |
+
<thead>
|
| 58 |
+
<tr>
|
| 59 |
+
<th align="center">Benchmark</th>
|
| 60 |
+
<th align="center"><sup>JoyAI-LLM Flash</sup></th>
|
| 61 |
+
<th align="center"><sup>Qwen3-30B-A3B-Instuct-2507</sup></th>
|
| 62 |
+
<th align="center"><sup>GLM-4.7-Flash<br>(Non-thinking)</sup></th>
|
| 63 |
+
</tr>
|
| 64 |
+
</thead>
|
| 65 |
+
<tbody>
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
<tr>
|
| 69 |
+
<td align="center" colspan=8><strong>Knowledge & Alignment</strong></td>
|
| 70 |
+
</tr>
|
| 71 |
+
<tr>
|
| 72 |
+
<td align="center" style="vertical-align: middle">MMLU</td>
|
| 73 |
+
<td align="center" style="vertical-align: middle"><strong>89.50</strong></td>
|
| 74 |
+
<td align="center" style="vertical-align: middle">86.87</td>
|
| 75 |
+
<td align="center" style="vertical-align: middle">80.53</td>
|
| 76 |
+
</tr>
|
| 77 |
+
<tr>
|
| 78 |
+
<td align="center" style="vertical-align: middle">MMLU-Pro</td>
|
| 79 |
+
<td align="center" style="vertical-align: middle"><strong>81.02</strong></td>
|
| 80 |
+
<td align="center" style="vertical-align: middle">73.88</td>
|
| 81 |
+
<td align="center" style="vertical-align: middle">63.62</td>
|
| 82 |
+
</tr>
|
| 83 |
+
<tr>
|
| 84 |
+
<td align="center" style="vertical-align: middle">CMMLU</td>
|
| 85 |
+
<td align="center" style="vertical-align: middle"><strong>87.03</strong></td>
|
| 86 |
+
<td align="center" style="vertical-align: middle">85.88</td>
|
| 87 |
+
<td align="center" style="vertical-align: middle">75.85</td>
|
| 88 |
+
</tr>
|
| 89 |
+
<tr>
|
| 90 |
+
<td align="center" style="vertical-align: middle">GPQA-Diamond</td>
|
| 91 |
+
<td align="center" style="vertical-align: middle"><strong>74.43</strong></td>
|
| 92 |
+
<td align="center" style="vertical-align: middle">68.69</td>
|
| 93 |
+
<td align="center" style="vertical-align: middle">39.90</td>
|
| 94 |
+
</tr>
|
| 95 |
+
<tr>
|
| 96 |
+
<td align="center" style="vertical-align: middle">SuperGPQA</td>
|
| 97 |
+
<td align="center" style="vertical-align: middle"><strong>55.00</strong></td>
|
| 98 |
+
<td align="center" style="vertical-align: middle">52.00</td>
|
| 99 |
+
<td align="center" style="vertical-align: middle">32.00</td>
|
| 100 |
+
</tr>
|
| 101 |
+
<tr>
|
| 102 |
+
<td align="center" style="vertical-align: middle">LiveBench</td>
|
| 103 |
+
<td align="center" style="vertical-align: middle"><strong>72.90</strong></td>
|
| 104 |
+
<td align="center" style="vertical-align: middle">59.70</td>
|
| 105 |
+
<td align="center" style="vertical-align: middle">43.10</td>
|
| 106 |
+
</tr>
|
| 107 |
+
<tr>
|
| 108 |
+
<td align="center" style="vertical-align: middle">IFEval</td>
|
| 109 |
+
<td align="center" style="vertical-align: middle"><strong>86.69</strong></td>
|
| 110 |
+
<td align="center" style="vertical-align: middle">83.18</td>
|
| 111 |
+
<td align="center" style="vertical-align: middle">82.44</td>
|
| 112 |
+
</tr>
|
| 113 |
+
<tr>
|
| 114 |
+
<td align="center" style="vertical-align: middle">AlignBench</td>
|
| 115 |
+
<td align="center" style="vertical-align: middle"><strong>8.24</strong></td>
|
| 116 |
+
<td align="center" style="vertical-align: middle">8.07</td>
|
| 117 |
+
<td align="center" style="vertical-align: middle">6.85</td>
|
| 118 |
+
</tr>
|
| 119 |
+
<tr>
|
| 120 |
+
<td align="center" style="vertical-align: middle">HellaSwag</td>
|
| 121 |
+
<td align="center" style="vertical-align: middle"><strong>91.79</strong></td>
|
| 122 |
+
<td align="center" style="vertical-align: middle">89.90</td>
|
| 123 |
+
<td align="center" style="vertical-align: middle">60.84</td>
|
| 124 |
+
</tr>
|
| 125 |
+
|
| 126 |
+
<tr>
|
| 127 |
+
<td align="center" colspan=8><strong>Coding</strong></td>
|
| 128 |
+
</tr>
|
| 129 |
+
<tr>
|
| 130 |
+
<td align="center" style="vertical-align: middle">HumanEval</td>
|
| 131 |
+
<td align="center" style="vertical-align: middle"><strong>96.34</strong></td>
|
| 132 |
+
<td align="center" style="vertical-align: middle">95.12</td>
|
| 133 |
+
<td align="center" style="vertical-align: middle">74.39</td>
|
| 134 |
+
</tr>
|
| 135 |
+
<tr>
|
| 136 |
+
<td align="center" style="vertical-align: middle">LiveCodeBench</td>
|
| 137 |
+
<td align="center" style="vertical-align: middle"><strong>65.60</strong></td>
|
| 138 |
+
<td align="center" style="vertical-align: middle">39.71</td>
|
| 139 |
+
<td align="center" style="vertical-align: middle">27.43</td>
|
| 140 |
+
</tr>
|
| 141 |
+
<tr>
|
| 142 |
+
<td align="center" style="vertical-align: middle">SciCode</td>
|
| 143 |
+
<td align="center" style="vertical-align: middle"><strong>3.08/22.92</strong></td>
|
| 144 |
+
<td align="center" style="vertical-align: middle"><strong>3.08/22.92</strong></td>
|
| 145 |
+
<td align="center" style="vertical-align: middle">3.08/15.11</td>
|
| 146 |
+
</tr>
|
| 147 |
+
<tr>
|
| 148 |
+
<td align="center" colspan=8><strong>Mathematics</strong></td>
|
| 149 |
+
</tr>
|
| 150 |
+
<tr>
|
| 151 |
+
<td align="center" style="vertical-align: middle">GSM8K</td>
|
| 152 |
+
<td align="center" style="vertical-align: middle"><strong>95.83</strong></td>
|
| 153 |
+
<td align="center" style="vertical-align: middle">79.83</td>
|
| 154 |
+
<td align="center" style="vertical-align: middle">81.88</td>
|
| 155 |
+
</tr>
|
| 156 |
+
<tr>
|
| 157 |
+
<td align="center" style="vertical-align: middle">AIME2025</td>
|
| 158 |
+
<td align="center" style="vertical-align: middle"><strong>65.83</strong></td>
|
| 159 |
+
<td align="center" style="vertical-align: middle">62.08</td>
|
| 160 |
+
<td align="center" style="vertical-align: middle">24.17</td>
|
| 161 |
+
</tr>
|
| 162 |
+
<tr>
|
| 163 |
+
<td align="center" style="vertical-align: middle">MATH 500</td>
|
| 164 |
+
<td align="center" style="vertical-align: middle"><strong>97.10</strong></td>
|
| 165 |
+
<td align="center" style="vertical-align: middle">89.80</td>
|
| 166 |
+
<td align="center" style="vertical-align: middle">90.90</td>
|
| 167 |
+
</tr>
|
| 168 |
+
|
| 169 |
+
<tr>
|
| 170 |
+
<td align="center" colspan=8><strong>Agentic</strong></td>
|
| 171 |
+
</tr>
|
| 172 |
+
<tr>
|
| 173 |
+
<td align="center" style="vertical-align: middle">SWE-bench Verified</td>
|
| 174 |
+
<td align="center" style="vertical-align: middle"><strong>60.60</strong></td>
|
| 175 |
+
<td align="center" style="vertical-align: middle">24.44</td>
|
| 176 |
+
<td align="center" style="vertical-align: middle">51.60</td>
|
| 177 |
+
</tr>
|
| 178 |
+
<tr>
|
| 179 |
+
<td align="center" style="vertical-align: middle">Tau2-Retail</td>
|
| 180 |
+
<td align="center" style="vertical-align: middle"><strong>67.55</strong></td>
|
| 181 |
+
<td align="center" style="vertical-align: middle">53.51</td>
|
| 182 |
+
<td align="center" style="vertical-align: middle">62.28</td>
|
| 183 |
+
</tr>
|
| 184 |
+
<tr>
|
| 185 |
+
<td align="center" style="vertical-align: middle">Tau2-Airline</td>
|
| 186 |
+
<td align="center" style="vertical-align: middle"><strong>54.00</strong></td>
|
| 187 |
+
<td align="center" style="vertical-align: middle">32.00</td>
|
| 188 |
+
<td align="center" style="vertical-align: middle">52.00</td>
|
| 189 |
+
</tr>
|
| 190 |
+
<tr>
|
| 191 |
+
<td align="center" style="vertical-align: middle">Tau2-Telecom</td>
|
| 192 |
+
<td align="center" style="vertical-align: middle">79.83</td>
|
| 193 |
+
<td align="center" style="vertical-align: middle">4.39</td>
|
| 194 |
+
<td align="center" style="vertical-align: middle"><strong>88.60</strong></td>
|
| 195 |
+
</tr>
|
| 196 |
+
|
| 197 |
+
<tr>
|
| 198 |
+
<td align="center" colspan=8><strong>Long Context</strong></td>
|
| 199 |
+
</tr>
|
| 200 |
+
<tr>
|
| 201 |
+
<td align="center" style="vertical-align: middle">RULER</td>
|
| 202 |
+
<td align="center" style="vertical-align: middle"><strong>95.60</strong></td>
|
| 203 |
+
<td align="center" style="vertical-align: middle">89.66</td>
|
| 204 |
+
<td align="center" style="vertical-align: middle">56.12</td>
|
| 205 |
+
</tr>
|
| 206 |
+
</tbody>
|
| 207 |
+
</table>
|
| 208 |
+
|
| 209 |
+
|
| 210 |
+
## 4. Deployment
|
| 211 |
+
|
| 212 |
+
> [!Note]
|
| 213 |
+
> You can access JoyAI-LLM Flash API on https://docs.jdcloud.com/cn/jdaip/chat and we provide OpenAI/Anthropic-compatible API for you.
|
| 214 |
+
> Currently, JoyAI-LLM Flash is recommended to run on the following inference engines:
|
| 215 |
+
|
| 216 |
+
* vLLM
|
| 217 |
+
* SGLang
|
| 218 |
+
|
| 219 |
+
The minimum version requirement for `transformers` is `4.57.1`.
|
| 220 |
+
|
| 221 |
+
Deployment examples can be found in the [Model Deployment Guide](docs/deploy_guidance.md).
|
| 222 |
+
|
| 223 |
+
|
| 224 |
+
|
| 225 |
+
## 5. Model Usage
|
| 226 |
+
|
| 227 |
+
The usage demos below demonstrate how to call our official API.
|
| 228 |
+
|
| 229 |
+
For third-party APIs deployed with vLLM or SGLang, please note that:
|
| 230 |
+
|
| 231 |
+
> [!Note] Recommended sampling parameters: `temperature=0.6`, `top_p=1.0`
|
| 232 |
+
|
| 233 |
+
### Chat Completion
|
| 234 |
+
|
| 235 |
+
This is a simple chat completion script which shows how to call JoyAI-Flash API.
|
| 236 |
+
|
| 237 |
+
```python
|
| 238 |
+
from openai import OpenAI
|
| 239 |
+
|
| 240 |
+
client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")
|
| 241 |
+
|
| 242 |
+
|
| 243 |
+
def simple_chat(client: OpenAI):
|
| 244 |
+
messages = [
|
| 245 |
+
{
|
| 246 |
+
"role": "user",
|
| 247 |
+
"content": [
|
| 248 |
+
{
|
| 249 |
+
"type": "text",
|
| 250 |
+
"text": "which one is bigger, 9.11 or 9.9? think carefully.",
|
| 251 |
+
}
|
| 252 |
+
],
|
| 253 |
+
},
|
| 254 |
+
]
|
| 255 |
+
model_name = client.models.list().data[0].id
|
| 256 |
+
response = client.chat.completions.create(
|
| 257 |
+
model=model_name, messages=messages, stream=False, max_tokens=4096
|
| 258 |
+
)
|
| 259 |
+
print(f"response: {response.choices[0].message.content}")
|
| 260 |
+
|
| 261 |
+
|
| 262 |
+
if __name__ == "__main__":
|
| 263 |
+
simple_chat(client)
|
| 264 |
+
```
|
| 265 |
+
|
| 266 |
+
|
| 267 |
+
### Tool call Completion
|
| 268 |
+
|
| 269 |
+
This is a simple toll call completion script which shows how to call JoyAI-Flash API.
|
| 270 |
+
|
| 271 |
+
```python
|
| 272 |
+
import json
|
| 273 |
+
|
| 274 |
+
from openai import OpenAI
|
| 275 |
+
|
| 276 |
+
client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")
|
| 277 |
+
|
| 278 |
+
|
| 279 |
+
def my_calculator(expression: str) -> str:
|
| 280 |
+
return str(eval(expression))
|
| 281 |
+
|
| 282 |
+
|
| 283 |
+
def rewrite(expression: str) -> str:
|
| 284 |
+
return str(expression)
|
| 285 |
+
|
| 286 |
+
|
| 287 |
+
def simple_tool_call(client: OpenAI):
|
| 288 |
+
messages = [
|
| 289 |
+
{
|
| 290 |
+
"role": "user",
|
| 291 |
+
"content": [
|
| 292 |
+
{
|
| 293 |
+
"type": "text",
|
| 294 |
+
"text": "use my functions to compute the results for the equations: 6+1",
|
| 295 |
+
},
|
| 296 |
+
],
|
| 297 |
+
},
|
| 298 |
+
]
|
| 299 |
+
tools = [
|
| 300 |
+
{
|
| 301 |
+
"type": "function",
|
| 302 |
+
"function": {
|
| 303 |
+
"name": "my_calculator",
|
| 304 |
+
"description": "A calculator that can evaluate a mathematical equation and compute its results.",
|
| 305 |
+
"parameters": {
|
| 306 |
+
"type": "object",
|
| 307 |
+
"properties": {
|
| 308 |
+
"expression": {
|
| 309 |
+
"type": "string",
|
| 310 |
+
"description": "The mathematical expression to evaluate.",
|
| 311 |
+
},
|
| 312 |
+
},
|
| 313 |
+
"required": ["expression"],
|
| 314 |
+
},
|
| 315 |
+
},
|
| 316 |
+
},
|
| 317 |
+
{
|
| 318 |
+
"type": "function",
|
| 319 |
+
"function": {
|
| 320 |
+
"name": "rewrite",
|
| 321 |
+
"description": "Rewrite a given text for improved clarity",
|
| 322 |
+
"parameters": {
|
| 323 |
+
"type": "object",
|
| 324 |
+
"properties": {
|
| 325 |
+
"text": {
|
| 326 |
+
"type": "string",
|
| 327 |
+
"description": "The input text to rewrite",
|
| 328 |
+
}
|
| 329 |
+
},
|
| 330 |
+
},
|
| 331 |
+
},
|
| 332 |
+
},
|
| 333 |
+
]
|
| 334 |
+
model_name = client.models.list().data[0].id
|
| 335 |
+
response = client.chat.completions.create(
|
| 336 |
+
model=model_name,
|
| 337 |
+
messages=messages,
|
| 338 |
+
temperature=1.0,
|
| 339 |
+
max_tokens=1024,
|
| 340 |
+
tools=tools,
|
| 341 |
+
tool_choice="auto",
|
| 342 |
+
)
|
| 343 |
+
tool_calls = response.choices[0].message.tool_calls
|
| 344 |
+
|
| 345 |
+
results = []
|
| 346 |
+
for tool_call in tool_calls:
|
| 347 |
+
function_name = tool_call.function.name
|
| 348 |
+
function_args = tool_call.function.arguments
|
| 349 |
+
if function_name == "my_calculator":
|
| 350 |
+
result = my_calculator(**json.loads(function_args))
|
| 351 |
+
results.append(result)
|
| 352 |
+
messages.append({"role": "assistant", "tool_calls": tool_calls})
|
| 353 |
+
for tool_call, result in zip(tool_calls, results):
|
| 354 |
+
messages.append(
|
| 355 |
+
{
|
| 356 |
+
"role": "tool",
|
| 357 |
+
"tool_call_id": tool_call.id,
|
| 358 |
+
"name": tool_call.function.name,
|
| 359 |
+
"content": result,
|
| 360 |
+
}
|
| 361 |
+
)
|
| 362 |
+
response = client.chat.completions.create(
|
| 363 |
+
model=model_name,
|
| 364 |
+
messages=messages,
|
| 365 |
+
temperature=1.0,
|
| 366 |
+
max_tokens=1024,
|
| 367 |
+
)
|
| 368 |
+
print(response.choices[0].message.content)
|
| 369 |
+
|
| 370 |
+
|
| 371 |
+
if __name__ == "__main__":
|
| 372 |
+
simple_tool_call(client)
|
| 373 |
+
|
| 374 |
+
```
|
| 375 |
+
|
| 376 |
+
---
|
| 377 |
+
|
| 378 |
+
## 6. License
|
| 379 |
+
|
| 380 |
+
Both the code repository and the model weights are released under the [Modified MIT License](LICENSE).
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,103 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- macro render_extra_keys(json_dict, handled_keys) -%}
|
| 2 |
+
{%- if json_dict is mapping -%}
|
| 3 |
+
{%- for json_key in json_dict if json_key not in handled_keys -%}
|
| 4 |
+
{%- if json_dict[json_key] is mapping or (json_dict[json_key] is sequence and json_dict[json_key] is not string) -%}
|
| 5 |
+
{{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson | safe) ~ '</' ~ json_key ~ '>' -}}
|
| 6 |
+
{%- else -%}
|
| 7 |
+
{{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' -}}
|
| 8 |
+
{%- endif -%}
|
| 9 |
+
{%- endfor -%}
|
| 10 |
+
{%- endif -%}
|
| 11 |
+
{%- endmacro -%}
|
| 12 |
+
|
| 13 |
+
{%- if not add_generation_prompt is defined -%}{%- set add_generation_prompt = false -%}{%- endif -%}
|
| 14 |
+
|
| 15 |
+
{%- set ns = namespace(system_prompt='', is_first_sp=true, is_last_user=false) -%}
|
| 16 |
+
{%- set default_system = "You are JoyAI , a large language model trained by JD(京东)that can interact with a computer to solve tasks. Answer as concisely as possible." -%}
|
| 17 |
+
{%- set ns.system_prompt = default_system -%}
|
| 18 |
+
|
| 19 |
+
{%- for message in messages -%}
|
| 20 |
+
{%- if message['role'] == 'system' -%}
|
| 21 |
+
{%- if ns.is_first_sp -%}
|
| 22 |
+
{%- set ns.system_prompt = message['content'] -%}
|
| 23 |
+
{%- set ns.is_first_sp = false -%}
|
| 24 |
+
{%- else -%}
|
| 25 |
+
{%- set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] -%}
|
| 26 |
+
{%- endif -%}
|
| 27 |
+
{%- endif -%}
|
| 28 |
+
{%- endfor -%}
|
| 29 |
+
|
| 30 |
+
{{- bos_token -}}{{- ns.system_prompt -}}
|
| 31 |
+
{%- if tools is iterable and tools | length > 0 -%}
|
| 32 |
+
{{- "\n\n# Tools\n\nYou have access to the following functions:\n\n" }}
|
| 33 |
+
{{- "<tools>" }}
|
| 34 |
+
{%- for tool in tools %}
|
| 35 |
+
{%- if tool.function is defined %}
|
| 36 |
+
{%- set tool = tool.function %}
|
| 37 |
+
{%- endif %}
|
| 38 |
+
{{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
|
| 39 |
+
{%- if tool.description is defined %}
|
| 40 |
+
{{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
|
| 41 |
+
{%- endif %}
|
| 42 |
+
{{- '\n<parameters>' }}
|
| 43 |
+
{%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
|
| 44 |
+
{%- for param_name, param_fields in tool.parameters.properties|items %}
|
| 45 |
+
{{- '\n<parameter>' }}
|
| 46 |
+
{{- '\n<name>' ~ param_name ~ '</name>' }}
|
| 47 |
+
{%- if param_fields.type is defined %}
|
| 48 |
+
{{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
|
| 49 |
+
{%- endif %}
|
| 50 |
+
{%- if param_fields.description is defined %}
|
| 51 |
+
{{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
|
| 52 |
+
{%- endif %}
|
| 53 |
+
{%- set handled_keys = ['name', 'type', 'description'] %}
|
| 54 |
+
{{- render_extra_keys(param_fields, handled_keys) }}
|
| 55 |
+
{{- '\n</parameter>' }}
|
| 56 |
+
{%- endfor %}
|
| 57 |
+
{%- endif %}
|
| 58 |
+
{% set handled_keys = ['type', 'properties'] %}
|
| 59 |
+
{{- render_extra_keys(tool.parameters, handled_keys) }}
|
| 60 |
+
{{- '\n</parameters>' }}
|
| 61 |
+
{%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
|
| 62 |
+
{{- render_extra_keys(tool, handled_keys) }}
|
| 63 |
+
{{- '\n</function>' }}
|
| 64 |
+
{%- endfor %}
|
| 65 |
+
{{- "\n</tools>" }}
|
| 66 |
+
{{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
|
| 67 |
+
{%- endif %}
|
| 68 |
+
{%- for message in messages -%}
|
| 69 |
+
{%- if message['role'] == 'user' -%}
|
| 70 |
+
{%- set ns.is_last_user = true -%}
|
| 71 |
+
{{- '<|User|>' + message['content'] -}}
|
| 72 |
+
{%- elif message['role'] == 'assistant' -%}
|
| 73 |
+
{%- if ns.is_last_user -%}
|
| 74 |
+
{{ '<|Assistant|>' }}
|
| 75 |
+
{%- endif -%}
|
| 76 |
+
{%- set ns.is_last_user = false -%}
|
| 77 |
+
{%- set content = message.get('content') | default('', true) -%}
|
| 78 |
+
{{ '<|end_of_thought|>' + content }}
|
| 79 |
+
{%- if message['tool_calls'] is defined and message['tool_calls'] is not none -%}
|
| 80 |
+
{%- for tool in message['tool_calls'] -%}
|
| 81 |
+
{%- if tool.function is defined %}{% set tool = tool.function %}{% endif -%}
|
| 82 |
+
{{- '\n<tool_call>\n<function=' + tool.name + '>\n' -}}
|
| 83 |
+
{%- if tool.arguments is defined -%}
|
| 84 |
+
{%- if tool.arguments is string -%}{%- set args_data = tool.arguments | from_json -%}{%- else -%}{%- set args_data = tool.arguments -%}{%- endif -%}
|
| 85 |
+
{%- for args_name, args_value in args_data.items() -%}
|
| 86 |
+
{{- '<parameter=' + args_name + '>\n' -}}
|
| 87 |
+
{%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string -%}
|
| 88 |
+
{{- args_value -}}{{- '\n</parameter>\n' -}}
|
| 89 |
+
{%- endfor -%}
|
| 90 |
+
{%- endif -%}
|
| 91 |
+
{{- '</function>\n</tool_call>' -}}
|
| 92 |
+
{%- endfor -%}
|
| 93 |
+
{%- endif -%}
|
| 94 |
+
{{ '<|end▁of▁sentence|>' }}
|
| 95 |
+
{%- elif message['role'] == 'tool' -%}
|
| 96 |
+
{%- set ns.is_last_user = true -%}
|
| 97 |
+
{{ '\n<tool_response>\n' + message['content'] + '\n</tool_response>' }}
|
| 98 |
+
{%- endif -%}
|
| 99 |
+
{%- endfor -%}
|
| 100 |
+
|
| 101 |
+
{%- if add_generation_prompt -%}
|
| 102 |
+
{{ '<|Assistant|>' }}{{ '<|end_of_thought|>' }}
|
| 103 |
+
{%- endif -%}
|
config.json
ADDED
|
@@ -0,0 +1,49 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"DeepseekV3ForCausalLM"
|
| 4 |
+
],
|
| 5 |
+
"attention_bias": false,
|
| 6 |
+
"attention_dropout": 0.0,
|
| 7 |
+
"auto_map": {
|
| 8 |
+
"AutoConfig": "configuration_deepseek.DeepseekV3Config",
|
| 9 |
+
"AutoModel": "modeling_deepseek.DeepseekV3Model",
|
| 10 |
+
"AutoModelForCausalLM": "modeling_deepseek.DeepseekV3ForCausalLM"
|
| 11 |
+
},
|
| 12 |
+
"bos_token_id": 0,
|
| 13 |
+
"eos_token_id": 1,
|
| 14 |
+
"ep_size": 1,
|
| 15 |
+
"first_k_dense_replace": 1,
|
| 16 |
+
"hidden_act": "silu",
|
| 17 |
+
"hidden_size": 2048,
|
| 18 |
+
"initializer_range": 0.02,
|
| 19 |
+
"intermediate_size": 7168,
|
| 20 |
+
"kv_lora_rank": 512,
|
| 21 |
+
"max_position_embeddings": 131072,
|
| 22 |
+
"model_type": "joyai_llm_flash",
|
| 23 |
+
"moe_intermediate_size": 768,
|
| 24 |
+
"moe_layer_freq": 1,
|
| 25 |
+
"n_group": 1,
|
| 26 |
+
"n_routed_experts": 256,
|
| 27 |
+
"n_shared_experts": 1,
|
| 28 |
+
"norm_topk_prob": true,
|
| 29 |
+
"num_attention_heads": 32,
|
| 30 |
+
"num_experts_per_tok": 8,
|
| 31 |
+
"num_hidden_layers": 40,
|
| 32 |
+
"num_key_value_heads": 32,
|
| 33 |
+
"num_nextn_predict_layers": 1,
|
| 34 |
+
"q_lora_rank": 1536,
|
| 35 |
+
"qk_nope_head_dim": 128,
|
| 36 |
+
"qk_rope_head_dim": 64,
|
| 37 |
+
"rms_norm_eps": 1e-06,
|
| 38 |
+
"rope_theta": 32000000,
|
| 39 |
+
"routed_scaling_factor": 2.5,
|
| 40 |
+
"scoring_func": "sigmoid",
|
| 41 |
+
"tie_word_embeddings": false,
|
| 42 |
+
"topk_group": 1,
|
| 43 |
+
"topk_method": "noaux_tc",
|
| 44 |
+
"torch_dtype": "bfloat16",
|
| 45 |
+
"transformers_version": "4.44.2",
|
| 46 |
+
"use_cache": true,
|
| 47 |
+
"v_head_dim": 128,
|
| 48 |
+
"vocab_size": 129280
|
| 49 |
+
}
|
configuration.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"framework":"Pytorch","task":"text-generation"}
|
configuration_deepseek.py
ADDED
|
@@ -0,0 +1,247 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# coding=utf-8
|
| 2 |
+
# Copyright 2025 bzantium and the HuggingFace Inc. team. All rights reserved.
|
| 3 |
+
#
|
| 4 |
+
# This code is based on the DeepSeekV3 implementations from the DeepSeek AI team. (https://huggingface.co/deepseek-ai/DeepSeek-V3)
|
| 5 |
+
|
| 6 |
+
# Licensed under the Apache License, Version 2.0 (the "License");
|
| 7 |
+
# you may not use this file except in compliance with the License.
|
| 8 |
+
# You may obtain a copy of the License at
|
| 9 |
+
#
|
| 10 |
+
# http://www.apache.org/licenses/LICENSE-2.0
|
| 11 |
+
#
|
| 12 |
+
# Unless required by applicable law or agreed to in writing, software
|
| 13 |
+
# distributed under the License is distributed on an "AS IS" BASIS,
|
| 14 |
+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
| 15 |
+
# See the License for the specific language governing permissions and
|
| 16 |
+
# limitations under the License.
|
| 17 |
+
"""DeepSeekV3 model configuration"""
|
| 18 |
+
|
| 19 |
+
from transformers.configuration_utils import PretrainedConfig
|
| 20 |
+
from transformers.modeling_rope_utils import rope_config_validation
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
DEEPSEEK_PRETRAINED_CONFIG_ARCHIVE_MAP = {}
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
class DeepseekV3Config(PretrainedConfig):
|
| 27 |
+
r"""
|
| 28 |
+
This is the configuration class to store the configuration of a [`DeepseekV3Model`]. It is used to instantiate an DeepSeek
|
| 29 |
+
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
|
| 30 |
+
defaults will yield a similar configuration to that of the DeepSeek-V3.
|
| 31 |
+
e.g. [bzantium/tiny-deepseek-v3](https://huggingface.co/bzantium/tiny-deepseek-v3)
|
| 32 |
+
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
|
| 33 |
+
documentation from [`PretrainedConfig`] for more information.
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
Args:
|
| 37 |
+
vocab_size (`int`, *optional*, defaults to 129280):
|
| 38 |
+
Vocabulary size of the Deep model. Defines the number of different tokens that can be represented by the
|
| 39 |
+
`inputs_ids` passed when calling [`DeepseekV3Model`]
|
| 40 |
+
hidden_size (`int`, *optional*, defaults to 7168):
|
| 41 |
+
Dimension of the hidden representations.
|
| 42 |
+
intermediate_size (`int`, *optional*, defaults to 18432):
|
| 43 |
+
Dimension of the MLP representations.
|
| 44 |
+
moe_intermediate_size (`int`, *optional*, defaults to 2048):
|
| 45 |
+
Dimension of the MoE representations.
|
| 46 |
+
num_hidden_layers (`int`, *optional*, defaults to 61):
|
| 47 |
+
Number of hidden layers in the Transformer decoder.
|
| 48 |
+
num_attention_heads (`int`, *optional*, defaults to 128):
|
| 49 |
+
Number of attention heads for each attention layer in the Transformer decoder.
|
| 50 |
+
num_key_value_heads (`int`, *optional*, defaults to 128):
|
| 51 |
+
This is the number of key_value heads that should be used to implement Grouped Query Attention. If
|
| 52 |
+
`num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
|
| 53 |
+
`num_key_value_heads=1 the model will use Multi Query Attention (MQA) otherwise GQA is used. When
|
| 54 |
+
converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
|
| 55 |
+
by meanpooling all the original heads within that group. For more details checkout [this
|
| 56 |
+
paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
|
| 57 |
+
`num_attention_heads`.
|
| 58 |
+
n_shared_experts (`int`, *optional*, defaults to 1):
|
| 59 |
+
Number of shared experts.
|
| 60 |
+
n_routed_experts (`int`, *optional*, defaults to 256):
|
| 61 |
+
Number of routed experts.
|
| 62 |
+
routed_scaling_factor (`float`, *optional*, defaults to 2.5):
|
| 63 |
+
Scaling factor or routed experts.
|
| 64 |
+
kv_lora_rank (`int`, *optional*, defaults to 512):
|
| 65 |
+
Rank of the LoRA matrices for key and value projections.
|
| 66 |
+
q_lora_rank (`int`, *optional*, defaults to 1536):
|
| 67 |
+
Rank of the LoRA matrices for query projections.
|
| 68 |
+
qk_rope_head_dim (`int`, *optional*, defaults to 64):
|
| 69 |
+
Dimension of the query/key heads that use rotary position embeddings.
|
| 70 |
+
v_head_dim (`int`, *optional*, defaults to 128):
|
| 71 |
+
Dimension of the value heads.
|
| 72 |
+
qk_nope_head_dim (`int`, *optional*, defaults to 128):
|
| 73 |
+
Dimension of the query/key heads that don't use rotary position embeddings.
|
| 74 |
+
n_group (`int`, *optional*, defaults to 8):
|
| 75 |
+
Number of groups for routed experts.
|
| 76 |
+
topk_group (`int`, *optional*, defaults to 4):
|
| 77 |
+
Number of selected groups for each token(for each token, ensuring the selected experts is only within `topk_group` groups).
|
| 78 |
+
num_experts_per_tok (`int`, *optional*, defaults to 8):
|
| 79 |
+
Number of selected experts, None means dense model.
|
| 80 |
+
first_k_dense_replace (`int`, *optional*, defaults to 3):
|
| 81 |
+
Number of dense layers in shallow layers(embed->dense->dense->...->dense->moe->moe...->lm_head).
|
| 82 |
+
\--k dense layers--/
|
| 83 |
+
norm_topk_prob (`bool`, *optional*, defaults to `True`):
|
| 84 |
+
Whether to normalize the weights of the routed experts.
|
| 85 |
+
hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
|
| 86 |
+
The non-linear activation function (function or string) in the decoder.
|
| 87 |
+
max_position_embeddings (`int`, *optional*, defaults to 4096):
|
| 88 |
+
The maximum sequence length that this model might ever be used with.
|
| 89 |
+
initializer_range (`float`, *optional*, defaults to 0.02):
|
| 90 |
+
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
|
| 91 |
+
rms_norm_eps (`float`, *optional*, defaults to 1e-06):
|
| 92 |
+
The epsilon used by the rms normalization layers.
|
| 93 |
+
use_cache (`bool`, *optional*, defaults to `True`):
|
| 94 |
+
Whether or not the model should return the last key/values attentions (not used by all models). Only
|
| 95 |
+
relevant if `config.is_decoder=True`.
|
| 96 |
+
pad_token_id (`int`, *optional*):
|
| 97 |
+
Padding token id.
|
| 98 |
+
bos_token_id (`int`, *optional*, defaults to 0):
|
| 99 |
+
Beginning of stream token id.
|
| 100 |
+
eos_token_id (`int`, *optional*, defaults to 1):
|
| 101 |
+
End of stream token id.
|
| 102 |
+
pretraining_tp (`int`, *optional*, defaults to 1):
|
| 103 |
+
Experimental feature. Tensor parallelism rank used during pretraining. Please refer to [this
|
| 104 |
+
document](https://huggingface.co/docs/transformers/parallelism) to understand more about it. This value is
|
| 105 |
+
necessary to ensure exact reproducibility of the pretraining results. Please refer to [this
|
| 106 |
+
issue](https://github.com/pytorch/pytorch/issues/76232).
|
| 107 |
+
tie_word_embeddings (`bool`, *optional*, defaults to `False`):
|
| 108 |
+
Whether to tie weight embeddings
|
| 109 |
+
rope_theta (`float`, *optional*, defaults to 10000.0):
|
| 110 |
+
The base period of the RoPE embeddings.
|
| 111 |
+
rope_scaling (`Dict`, *optional*):
|
| 112 |
+
Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
|
| 113 |
+
strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
|
| 114 |
+
`{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
|
| 115 |
+
`max_position_embeddings` to the expected new maximum.
|
| 116 |
+
rope_interleave (`bool`, *optional*, defaults to `True`):
|
| 117 |
+
Whether to interleave the rotary position embeddings.
|
| 118 |
+
attention_bias (`bool`, defaults to `False`, *optional*, defaults to `False`):
|
| 119 |
+
Whether to use a bias in the query, key, value and output projection layers during self-attention.
|
| 120 |
+
attention_dropout (`float`, *optional*, defaults to 0.0):
|
| 121 |
+
The dropout ratio for the attention probabilities.
|
| 122 |
+
|
| 123 |
+
```python
|
| 124 |
+
>>> from transformers import DeepseekV3Model, DeepseekV3Config
|
| 125 |
+
|
| 126 |
+
>>> # Initializing a Deepseek-V3 style configuration
|
| 127 |
+
>>> configuration = DeepseekV3Config()
|
| 128 |
+
|
| 129 |
+
>>> # Accessing the model configuration
|
| 130 |
+
>>> configuration = model.config
|
| 131 |
+
```"""
|
| 132 |
+
|
| 133 |
+
model_type = "deepseek_v3"
|
| 134 |
+
keys_to_ignore_at_inference = ["past_key_values"]
|
| 135 |
+
base_model_tp_plan = { # TODO: only replicate attention layers when > first_k_dense_replace
|
| 136 |
+
"layers.*.mlp.experts.*.gate_proj": "local_colwise",
|
| 137 |
+
"layers.*.mlp.experts.*.up_proj": "local_colwise",
|
| 138 |
+
"layers.*.mlp.experts.*.down_proj": "local_rowwise",
|
| 139 |
+
"layers.*.mlp.experts.*": "local", # each expert is wrapped in a module list
|
| 140 |
+
"layers.*.mlp.shared_experts.gate_proj": "local_colwise",
|
| 141 |
+
"layers.*.mlp.shared_experts.up_proj": "local_colwise",
|
| 142 |
+
"layers.*.mlp.shared_experts.down_proj": "local_rowwise",
|
| 143 |
+
"layers.*.mlp.shared_experts": "local",
|
| 144 |
+
"layers.*.mlp.gate_proj": "local_colwise",
|
| 145 |
+
"layers.*.mlp.up_proj": "local_colwise",
|
| 146 |
+
"layers.*.mlp.down_proj": "local_rowwise",
|
| 147 |
+
"layers.*.mlp": "gather", # This is the only moment where results are gathered
|
| 148 |
+
}
|
| 149 |
+
base_model_pp_plan = {
|
| 150 |
+
"embed_tokens": (["input_ids"], ["inputs_embeds"]),
|
| 151 |
+
"layers": (["hidden_states", "attention_mask"], ["hidden_states"]),
|
| 152 |
+
"norm": (["hidden_states"], ["hidden_states"]),
|
| 153 |
+
}
|
| 154 |
+
|
| 155 |
+
def __init__(
|
| 156 |
+
self,
|
| 157 |
+
vocab_size=129280,
|
| 158 |
+
hidden_size=7168,
|
| 159 |
+
intermediate_size=18432,
|
| 160 |
+
moe_intermediate_size=2048,
|
| 161 |
+
num_hidden_layers=61,
|
| 162 |
+
num_attention_heads=128,
|
| 163 |
+
num_key_value_heads=128,
|
| 164 |
+
n_shared_experts=1,
|
| 165 |
+
n_routed_experts=256,
|
| 166 |
+
routed_scaling_factor=2.5,
|
| 167 |
+
kv_lora_rank=512,
|
| 168 |
+
q_lora_rank=1536,
|
| 169 |
+
qk_rope_head_dim=64,
|
| 170 |
+
v_head_dim=128,
|
| 171 |
+
qk_nope_head_dim=128,
|
| 172 |
+
n_group=8,
|
| 173 |
+
topk_group=4,
|
| 174 |
+
num_experts_per_tok=8,
|
| 175 |
+
first_k_dense_replace=3,
|
| 176 |
+
norm_topk_prob=True,
|
| 177 |
+
hidden_act="silu",
|
| 178 |
+
max_position_embeddings=4096,
|
| 179 |
+
initializer_range=0.02,
|
| 180 |
+
rms_norm_eps=1e-6,
|
| 181 |
+
use_cache=True,
|
| 182 |
+
pad_token_id=None,
|
| 183 |
+
bos_token_id=0,
|
| 184 |
+
eos_token_id=1,
|
| 185 |
+
pretraining_tp=1,
|
| 186 |
+
tie_word_embeddings=False,
|
| 187 |
+
rope_theta=10000.0,
|
| 188 |
+
rope_scaling=None,
|
| 189 |
+
rope_interleave=True,
|
| 190 |
+
attention_bias=False,
|
| 191 |
+
attention_dropout=0.0,
|
| 192 |
+
**kwargs,
|
| 193 |
+
):
|
| 194 |
+
self.vocab_size = vocab_size
|
| 195 |
+
self.max_position_embeddings = max_position_embeddings
|
| 196 |
+
self.hidden_size = hidden_size
|
| 197 |
+
self.intermediate_size = intermediate_size
|
| 198 |
+
self.moe_intermediate_size = moe_intermediate_size
|
| 199 |
+
self.num_hidden_layers = num_hidden_layers
|
| 200 |
+
self.num_attention_heads = num_attention_heads
|
| 201 |
+
self.n_shared_experts = n_shared_experts
|
| 202 |
+
self.n_routed_experts = n_routed_experts
|
| 203 |
+
self.routed_scaling_factor = routed_scaling_factor
|
| 204 |
+
self.kv_lora_rank = kv_lora_rank
|
| 205 |
+
self.q_lora_rank = q_lora_rank
|
| 206 |
+
self.qk_rope_head_dim = qk_rope_head_dim
|
| 207 |
+
self.v_head_dim = v_head_dim
|
| 208 |
+
self.qk_nope_head_dim = qk_nope_head_dim
|
| 209 |
+
self.qk_head_dim = qk_nope_head_dim + qk_rope_head_dim
|
| 210 |
+
self.head_dim = qk_rope_head_dim
|
| 211 |
+
self.n_group = n_group
|
| 212 |
+
self.topk_group = topk_group
|
| 213 |
+
self.num_experts_per_tok = num_experts_per_tok
|
| 214 |
+
self.first_k_dense_replace = first_k_dense_replace
|
| 215 |
+
self.norm_topk_prob = norm_topk_prob
|
| 216 |
+
self.rope_interleave = rope_interleave
|
| 217 |
+
|
| 218 |
+
# for backward compatibility
|
| 219 |
+
if num_key_value_heads is None:
|
| 220 |
+
num_key_value_heads = num_attention_heads
|
| 221 |
+
|
| 222 |
+
self.num_key_value_heads = num_key_value_heads
|
| 223 |
+
self.hidden_act = hidden_act
|
| 224 |
+
self.initializer_range = initializer_range
|
| 225 |
+
self.rms_norm_eps = rms_norm_eps
|
| 226 |
+
self.pretraining_tp = pretraining_tp
|
| 227 |
+
self.use_cache = use_cache
|
| 228 |
+
self.rope_theta = rope_theta
|
| 229 |
+
self.rope_scaling = rope_scaling
|
| 230 |
+
self.attention_bias = attention_bias
|
| 231 |
+
self.attention_dropout = attention_dropout
|
| 232 |
+
# Validate the correctness of rotary position embeddings parameters
|
| 233 |
+
# BC: if there is a 'type' field, copy it it to 'rope_type'.
|
| 234 |
+
if self.rope_scaling is not None and "type" in self.rope_scaling:
|
| 235 |
+
self.rope_scaling["rope_type"] = self.rope_scaling["type"]
|
| 236 |
+
rope_config_validation(self)
|
| 237 |
+
|
| 238 |
+
super().__init__(
|
| 239 |
+
pad_token_id=pad_token_id,
|
| 240 |
+
bos_token_id=bos_token_id,
|
| 241 |
+
eos_token_id=eos_token_id,
|
| 242 |
+
tie_word_embeddings=tie_word_embeddings,
|
| 243 |
+
**kwargs,
|
| 244 |
+
)
|
| 245 |
+
|
| 246 |
+
|
| 247 |
+
__all__ = ["DeepseekV3Config"]
|
docs/deploy_guidance.md
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# JoyAI-LLM Flash Deployment Guide
|
| 2 |
+
|
| 3 |
+
> [!Note]
|
| 4 |
+
> This guide offers a selection of deployment command examples for JoyAI-LLM Flash, which may not be the optimal configuration. Given the rapid evolution of inference engines, we recommend referring to their official documentation for the latest updates to ensure peak performance.
|
| 5 |
+
|
| 6 |
+
> Support for JoyAI-LLM Flash’s dense MTP architecture is currently being integrated into vLLM and SGLang. Until these PRs are merged into a stable release, please use the nightly Docker image for access to these features.
|
| 7 |
+
|
| 8 |
+
## vLLM Deployment
|
| 9 |
+
|
| 10 |
+
Here is the example to serve this model on a H200 single node with TP8 via vLLM:
|
| 11 |
+
|
| 12 |
+
1. pull the Docker image.
|
| 13 |
+
```bash
|
| 14 |
+
docker pull jdopensource/joyai-llm-vllm:v0.13.0-joyai_llm_flash
|
| 15 |
+
```
|
| 16 |
+
2. launch JoyAI-LLM Flash model with dense MTP.
|
| 17 |
+
```bash
|
| 18 |
+
vllm serve ${MODEL_PATH} --tp 8 --trust-remote-code \
|
| 19 |
+
--tool-call-parser qwen3_coder --enable-auto-tool-choice \
|
| 20 |
+
--speculative-config $'{"method": "mtp", "num_speculative_tokens": 3}'
|
| 21 |
+
```
|
| 22 |
+
**Key notes**
|
| 23 |
+
- `--tool-call-parser qwen3_coder`: Required for enabling tool calling
|
| 24 |
+
|
| 25 |
+
## SGLang Deployment
|
| 26 |
+
|
| 27 |
+
Similarly, here is the example to run with TP8 on H200 in a single node via SGLang:
|
| 28 |
+
|
| 29 |
+
1. pull the Docker image.
|
| 30 |
+
```bash
|
| 31 |
+
docker pull jdopensource/joyai-llm-sglang:v0.5.8-joyai_llm_flash
|
| 32 |
+
```
|
| 33 |
+
2. launch JoyAI-LLM Flash model with dense MTP.
|
| 34 |
+
|
| 35 |
+
```bash
|
| 36 |
+
python3 -m sglang.launch_server --model-path ${MODEL_PATH} --tp-size 8 --trust-remote-code \
|
| 37 |
+
--tool-call-parser qwen3_coder \
|
| 38 |
+
--speculative-algorithm EAGLE --speculative-draft-model-path ${MTP_MODEL_PATH} \
|
| 39 |
+
--speculative-num-steps 2 --speculative-eagle-topk 2 --speculative-num-draft-tokens 3
|
| 40 |
+
```
|
| 41 |
+
**Key notes:**
|
| 42 |
+
- `--tool-call-parser qwen3_coder`: Required when enabling tool usage.
|
figures/joyai-logo.png
ADDED
|
Git LFS Details
|
model-1-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:00342c45cd62e28fe183e2529ce61e2cecf0d8ea5451b2a8fe4137ae5e50e901
|
| 3 |
+
size 140785016
|
model-10-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:25566dca71af5a8ad118ecff34fe03acda6d2483b9127cd26211be2082d5d6c8
|
| 3 |
+
size 2479205264
|
model-11-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:106ca52367ca7d80b9f99553e3ad01b820ec12f28be1d2fc57233f2ce33a5199
|
| 3 |
+
size 2479206048
|
model-12-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bc1a396967e8b7cfafe9399c031a2e869319b4bf40750a58f91b81037a36f0f2
|
| 3 |
+
size 2479206048
|
model-13-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b813cd997285699eb10401e7b80c87773900b6c9ca48a305f228824dde553fe3
|
| 3 |
+
size 2479206048
|
model-14-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b78f08465829cee2cf4d9a06912e33306c76a1251ac0f6637a2a505bef3376f6
|
| 3 |
+
size 2479206048
|
model-15-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:415653c237c5189bcf725a587b7e7db5c15c858d9748c11fa9ee9a97c77ebf3e
|
| 3 |
+
size 2479206048
|
model-16-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:edc2ec545b27884512149627d4c68c6e80a47429ab3f125e14f889aeb5c4555c
|
| 3 |
+
size 2479206048
|
model-17-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:849209aeddc1e424dfd58c9b7bc85bb7fc26ecd79e35d0c3e0c1246ce7ea6cd8
|
| 3 |
+
size 2479206048
|
model-18-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:37a1c9d67e32905eee90a59028f843b3e2df75b7fdf310347467525ba8af0778
|
| 3 |
+
size 2479206048
|
model-19-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:71940345d604e462c154b15eba80ce6013a9b47daaae628133c7a7b561ec0947
|
| 3 |
+
size 2479206048
|
model-2-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:aab17dcbeecf180ed13a2cd84fa619eb9b84dc05075252c464154ada4078a771
|
| 3 |
+
size 2479205264
|
model-20-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6d562b929014355f66d821e31b8b6c3a69c298e464a6809bb5d84dbad34be49d
|
| 3 |
+
size 2479206048
|
model-21-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:36db259d412d9278343b91f5c0844c8a3c7992ceee11ebfe562494fb3934a012
|
| 3 |
+
size 2479206048
|
model-22-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2a63e252bf2374910cf4c8a26d1c2ffe0ea88348dae23fed3cbe99c82443213b
|
| 3 |
+
size 2479206048
|
model-23-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:490edf4128ec42e523730148c73c29445b8eea7a2d81d2e3296bc957b5d5dad7
|
| 3 |
+
size 2479206048
|
model-24-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c82fda4cc8b9570898f355b1ca8f640262e269710d0efd914d387825b82e90f4
|
| 3 |
+
size 2479206048
|
model-25-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:371774aa4bdfbefc9c62e4dd7a289e2cc15fe8ac8dd01f193bc3951c395149d1
|
| 3 |
+
size 2479206048
|
model-26-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ba82a2dee480d47bf4ec07bf15078affa6350b0d4414ed0152c9b6a34fa4cfa9
|
| 3 |
+
size 2479206048
|
model-27-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0061eca323f71053806bdd9d78e098fc28e8442b3cd27581407f56b2507e3a80
|
| 3 |
+
size 2479206048
|
model-28-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e81ab12d6b7ab57a024daba11ccd75eaf4b3579b68aa7fc5e043a7c216df845a
|
| 3 |
+
size 2479206048
|
model-29-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5fc269095eba7b9ffdbd51986cc5d805d73f32333b759a502f09ff55e797789a
|
| 3 |
+
size 2479206048
|
model-3-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:734577bcc726ead7e0a9e8cad31edddb3d4ee2cc34fa5acecd37daea8332fc58
|
| 3 |
+
size 2479205264
|
model-30-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:423416965094dc7d45422a5ca9d285cab7b12836bddd7a4c4f704ce996da3a00
|
| 3 |
+
size 2479206048
|
model-31-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e405c1eebce9eaa75db83d8db6a108bd8d28b5e1e76903790dd20ae13297999e
|
| 3 |
+
size 2479206048
|
model-32-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7f1b25c9b64915ba066ab35a007928d48ed3ae75fabbcc45a269b3638268c999
|
| 3 |
+
size 2479206048
|
model-33-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f927f6e3246b676dc31fb4f227de92073ee54ea790042f7077d3d5ea5732d90f
|
| 3 |
+
size 2479206048
|
model-34-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3582ea2db95a41584f8ceb30209fa8415e7fd254dc43be695f9b67dd28931ef2
|
| 3 |
+
size 2479206048
|
model-35-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:feeffd5917defddd4543b8747bd33da9ee2cabfa3ff57634ebf1718e9ed46563
|
| 3 |
+
size 2479206048
|
model-36-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:526404ba27ab66b8dcc455e1ca2234fab4843a890d7f6513e37eeebbfe2aaaf7
|
| 3 |
+
size 2479206048
|
model-37-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4d8a43b4d41be94f6f691c254f79d8994611e9229fedd23dfcd4fcc39e0853de
|
| 3 |
+
size 2479206048
|
model-38-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:51467503269ccea044ec96c7f94aff0efa523d586a8736dda70414fca29c5a03
|
| 3 |
+
size 2479206048
|
model-39-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:deb76b58136b1144e79b0de587d991aed8c691fa4efcf1e54d118ec9b489c22d
|
| 3 |
+
size 2479206048
|
model-4-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bb292fd660dfc54cf9afe89938d431bf58d8634576c8cc52bc874d4b68fadbd9
|
| 3 |
+
size 2479205264
|
model-40-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9b09ac0b99a72696162f673ae20ad74b8ae5231fdf04f663d84c4ba981e83cf8
|
| 3 |
+
size 2479206048
|
model-5-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f9c7bc3ad5832be99e39cdbfdc3f843d5c0d6edb9de0c0d8d34524d049b1d945
|
| 3 |
+
size 2479205264
|
model-6-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:eccfb66a683dd836293360c8c8441c466209708fd0220b04f7511814d75aed7c
|
| 3 |
+
size 2479205264
|
model-7-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:32cfe5bfb346384efb9531263d540f85d5040d2359c492f10f8a59bc5944bba4
|
| 3 |
+
size 2479205264
|
model-8-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:42c68787a8622f322c03190dda75801e30f65aa26997df147d9f7dec4fd264b4
|
| 3 |
+
size 2479205264
|
model-9-of-40.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1bf720fd443166d2e2a2f1a524035959febadfa079c7971c5a777642183008d6
|
| 3 |
+
size 2479205264
|
model-non-layer.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:fd760d732c11c23778a0dbf2280b62431d77d1f4ebc4f01f111cf716786981f0
|
| 3 |
+
size 1059066184
|