Update README.md

a0d7225 verified 3 days ago

4.96 kB

	---
	language:
	- zh
	- en
	pipeline_tag: text-generation
	---
	<div align="center">
	<picture>
	<img src="figures/joyai-logo.png" width="30%" alt="JoyAI-LLM Flash-Base">
	</picture>
	</div>
	<hr>



	<div align="center" style="line-height: 1;">
	<a href="https://huggingface.co/jdopensource" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-JD-ffc107?color=ffc107&logoColor=white"/></a>
	<a href="https://huggingface.co/jdopensource/JoyAI-LLM-Flash-Base/blob/main/LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Modified_MIT-f5de53?&color=f5de53"/></a>
	</div>




	## 1. Model Introduction

	JoyAI-LLM Flash-Base is a state-of-the-art mixture-of-experts (MoE) language model with 3 billion activated parameters and 48 billion total parameters. Trained with the Muon optimizer, JoyAI Flash-base achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities. JoyAI-LLM Flash series aim to accelarate high-throughput, latency-sensitive applications where cost per query must remain minimal.

	### Key Features

	- Training-Inference Collaboration: apply Muon optimizer with dense MTP, develop novel optimization techniques to resolve instabilities while scaling up, delivering 1.3× to 1.7× the throughput of the non-MTP version.
	- Agentic Intelligence: Specifically designed for tool use, reasoning, and autonomous problem-solving.

	## 2. Model Summary

	\| \| \|
	\| :-----------------------------------------: \| :----------------------: \|
	\| Architecture \| Mixture-of-Experts (MoE) \|
	\| Total Parameters \| 48B \|
	\| Activated Parameters \| 3B \|
	\| Number of Layers (Dense layer included) \| 40 \|
	\| Number of Dense Layers \| 1 \|
	\| Attention Hidden Dimension \| 2048 \|
	\| MoE Hidden Dimension (per Expert) \| 768 \|
	\| Number of Attention Heads \| 32 \|
	\| Number of Experts \| 256 \|
	\| Selected Experts per Token \| 8 \|
	\| Number of Shared Experts \| 1 \|
	\| Vocabulary Size \| 129K \|
	\| Context Length \| 128K \|
	\| Attention Mechanism \| MLA \|
	\| Activation Function \| SwiGLU \|
	\| </div> \| \|

	## 3. Evaluation Results


	<table>
	<thead>
	<tr>
	<th align="center">Benchmark</th>
	<th align="center"><sup>JoyAI-LLM Flash-base</sup></th>
	<th align="center"><sup>Qwen3-30B-A3B-base</sup></th>
	</tr>
	</thead>
	<tbody>


	<tr>
	<td align="center" style="vertical-align: middle">MMLU</td>
	<td align="center" style="vertical-align: middle"><strong>84.70</strong></td>
	<td align="center" style="vertical-align: middle">82.12</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">MMLU-Pro</td>
	<td align="center" style="vertical-align: middle"><strong>73.14</strong></td>
	<td align="center" style="vertical-align: middle">61.76</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">CMMLU</td>
	<td align="center" style="vertical-align: middle">83.09</td>
	<td align="center" style="vertical-align: middle"><strong>83.60</strong></td>
	</tr>
	<tr>
	</tr>


	<tr>
	<td align="center" style="vertical-align: middle">HumanEval</td>
	<td align="center" style="vertical-align: middle">85.37</td>
	<td align="center" style="vertical-align: middle"><strong>87.80</strong></td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">LiveCodeBench</td>
	<td align="center" style="vertical-align: middle"><strong>39.91</strong></td>
	<td align="center" style="vertical-align: middle">37.34</td>
	</tr>
	<tr></tr>

	<tr>
	<td align="center" style="vertical-align: middle">GSM8K</td>
	<td align="center" style="vertical-align: middle">88.78</td>
	<td align="center" style="vertical-align: middle"><strong>90.37</strong></td>
	</tr>
	<tr>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">MATH</td>
	<td align="center" style="vertical-align: middle"><strong>78.16</strong></td>
	<td align="center" style="vertical-align: middle">59.60</td>
	</tr>
	<tr>
	<td align="center" style="vertical-align: middle">MATH 500</td>
	<td align="center" style="vertical-align: middle"><strong>77.00</strong></td>
	<td align="center" style="vertical-align: middle">58.00</td>
	</tr>

	</tbody>
	</table>



	## 4. License

	Both the code repository and the model weights are released under the [Modified MIT License](LICENSE).