bianrongcheng0124's picture
Update README.md
a0d7225 verified
---
language:
- zh
- en
pipeline_tag: text-generation
---
<div align="center">
<picture>
<img src="figures/joyai-logo.png" width="30%" alt="JoyAI-LLM Flash-Base">
</picture>
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/jdopensource" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-JD-ffc107?color=ffc107&logoColor=white"/></a>
<a href="https://huggingface.co/jdopensource/JoyAI-LLM-Flash-Base/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Modified_MIT-f5de53?&color=f5de53"/></a>
</div>
## 1. Model Introduction
JoyAI-LLM Flash-Base is a state-of-the-art mixture-of-experts (MoE) language model with 3 billion activated parameters and 48 billion total parameters. Trained with the Muon optimizer, JoyAI Flash-base achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. JoyAI-LLM Flash series aim to accelarate high-throughput, latency-sensitive applications where cost per query must remain minimal.
### Key Features
- Training-Inference Collaboration: apply Muon optimizer with dense MTP, develop novel optimization techniques to resolve instabilities while scaling up, delivering 1.3× to 1.7× the throughput of the non-MTP version.
- Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving.
## 2. Model Summary
| | |
| :-----------------------------------------: | :----------------------: |
| **Architecture** | Mixture-of-Experts (MoE) |
| **Total Parameters** | 48B |
| **Activated Parameters** | 3B |
| **Number of Layers** (Dense layer included) | 40 |
| **Number of Dense Layers** | 1 |
| **Attention Hidden Dimension** | 2048 |
| **MoE Hidden Dimension** (per Expert) | 768 |
| **Number of Attention Heads** | 32 |
| **Number of Experts** | 256 |
| **Selected Experts per Token** | 8 |
| **Number of Shared Experts** | 1 |
| **Vocabulary Size** | 129K |
| **Context Length** | 128K |
| **Attention Mechanism** | MLA |
| **Activation Function** | SwiGLU |
| </div> | |
## 3. Evaluation Results
<table>
<thead>
<tr>
<th align="center">Benchmark</th>
<th align="center"><sup>JoyAI-LLM Flash-base</sup></th>
<th align="center"><sup>Qwen3-30B-A3B-base</sup></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" style="vertical-align: middle">MMLU</td>
<td align="center" style="vertical-align: middle"><strong>84.70</strong></td>
<td align="center" style="vertical-align: middle">82.12</td>
</tr>
<tr>
<td align="center" style="vertical-align: middle">MMLU-Pro</td>
<td align="center" style="vertical-align: middle"><strong>73.14</strong></td>
<td align="center" style="vertical-align: middle">61.76</td>
</tr>
<tr>
<td align="center" style="vertical-align: middle">CMMLU</td>
<td align="center" style="vertical-align: middle">83.09</td>
<td align="center" style="vertical-align: middle"><strong>83.60</strong></td>
</tr>
<tr>
</tr>
<tr>
<td align="center" style="vertical-align: middle">HumanEval</td>
<td align="center" style="vertical-align: middle">85.37</td>
<td align="center" style="vertical-align: middle"><strong>87.80</strong></td>
</tr>
<tr>
<td align="center" style="vertical-align: middle">LiveCodeBench</td>
<td align="center" style="vertical-align: middle"><strong>39.91</strong></td>
<td align="center" style="vertical-align: middle">37.34</td>
</tr>
<tr></tr>
<tr>
<td align="center" style="vertical-align: middle">GSM8K</td>
<td align="center" style="vertical-align: middle">88.78</td>
<td align="center" style="vertical-align: middle"><strong>90.37</strong></td>
</tr>
<tr>
</tr>
<tr>
<td align="center" style="vertical-align: middle">MATH</td>
<td align="center" style="vertical-align: middle"><strong>78.16</strong></td>
<td align="center" style="vertical-align: middle">59.60</td>
</tr>
<tr>
<td align="center" style="vertical-align: middle">MATH 500</td>
<td align="center" style="vertical-align: middle"><strong>77.00</strong></td>
<td align="center" style="vertical-align: middle">58.00</td>
</tr>
</tbody>
</table>
## 4. License
Both the code repository and the model weights are released under the [Modified MIT License](LICENSE).