---
base_model:
- jdopensource/JoyAI-LLM-Flash
---
## 1. Model Introduction
JoyAI-LLM Flash is a state-of-the-art medium-sized instruct language model with 3 billion activated parameters and 48 billion total parameters. JoyAI-LLM Flash was pretrained on 20 trillion text tokens using Muon optimizer, followed by large-scale supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning (RL) across diverse environments. JoyAI-LLM Flash achieves strong performance across frontier knowledge, reasoning, coding tasks and agentic capabilities.
### Key Features
- Fiber Bundle RL: Introduces fiber bundle theory into reinforcement learning, proposing a novel optimization framework, FiberPO. This method is specifically designed to handle the challenges of large-scale and heterogeneous agent training, improving stability and robustness under complex data distributions.
- Training-Inference Collaboration: apply Muon optimizer with dense MTP, develop novel optimization techniques to resolve instabilities while scaling up, delivering 1.3× to 1.7× the throughput of the non-MTP version.
- Agentic Intelligence: designed for tool use, reasoning, and autonomous problem-solving.
## 2. Model Summary
| | |
| :-----------------------------------------: | :----------------------: |
| **Architecture** | Mixture-of-Experts (MoE) |
| **Total Parameters** | 48B |
| **Activated Parameters** | 3B |
| **Number of Layers** (Dense layer included) | 40 |
| **Number of Dense Layers** | 1 |
| **Attention Hidden Dimension** | 2048 |
| **MoE Hidden Dimension** (per Expert) | 768 |
| **Number of Attention Heads** | 32 |
| **Number of Experts** | 256 |
| **Selected Experts per Token** | 8 |
| **Number of Shared Experts** | 1 |
| **Vocabulary Size** | 129K |
| **Context Length** | 128K |
| **Attention Mechanism** | MLA |
| **Activation Function** | SwiGLU |
| | |
## 3. Evaluation Results
| Benchmark |
JoyAI-LLM Flash |
Qwen3-30B-A3B-Instuct-2507 |
GLM-4.7-Flash (Non-thinking) |
| Knowledge & Alignment |
| MMLU |
89.50 |
86.87 |
80.53 |
| MMLU-Pro |
81.02 |
73.88 |
63.62 |
| CMMLU |
87.03 |
85.88 |
75.85 |
| GPQA-Diamond |
74.43 |
68.69 |
39.90 |
| SuperGPQA |
55.00 |
52.00 |
32.00 |
| LiveBench |
72.90 |
59.70 |
43.10 |
| IFEval |
86.69 |
83.18 |
82.44 |
| AlignBench |
8.24 |
8.07 |
6.85 |
| HellaSwag |
91.79 |
89.90 |
60.84 |
| Coding |
| HumanEval |
96.34 |
95.12 |
74.39 |
| LiveCodeBench |
65.60 |
39.71 |
27.43 |
| SciCode |
3.08/22.92 |
3.08/22.92 |
3.08/15.11 |
| Mathematics |
| GSM8K |
95.83 |
79.83 |
81.88 |
| AIME2025 |
65.83 |
62.08 |
24.17 |
| MATH 500 |
97.10 |
89.80 |
90.90 |
| Agentic |
| SWE-bench Verified |
60.60 |
24.44 |
51.60 |
| Tau2-Retail |
67.55 |
53.51 |
62.28 |
| Tau2-Airline |
54.00 |
32.00 |
52.00 |
| Tau2-Telecom |
79.83 |
4.39 |
88.60 |
| Long Context |
| RULER |
95.60 |
89.66 |
56.12 |
## 4. Deployment
> [!Note]
> You can access JoyAI-LLM Flash API on https://docs.jdcloud.com/cn/jdaip/chat and we provide OpenAI/Anthropic-compatible API for you.
> Currently, JoyAI-LLM Flash is recommended to run on the following inference engines:
* vLLM
* SGLang
The minimum version requirement for `transformers` is `4.57.1`.
Deployment examples can be found in the [Model Deployment Guide](docs/deploy_guidance.md).
## 5. Model Usage
The usage demos below demonstrate how to call our official API.
For third-party APIs deployed with vLLM or SGLang, please note that:
> [!Note] Recommended sampling parameters: `temperature=0.6`, `top_p=1.0`
### Chat Completion
This is a simple chat completion script which shows how to call JoyAI-Flash API.
```python
from openai import OpenAI
client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")
def simple_chat(client: OpenAI):
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "which one is bigger, 9.11 or 9.9? think carefully.",
}
],
},
]
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
model=model_name, messages=messages, stream=False, max_tokens=4096
)
print(f"response: {response.choices[0].message.content}")
if __name__ == "__main__":
simple_chat(client)
```
### Tool call Completion
This is a simple toll call completion script which shows how to call JoyAI-Flash API.
```python
import json
from openai import OpenAI
client = OpenAI(base_url="http://IP:PORT/v1", api_key="EMPTY")
def my_calculator(expression: str) -> str:
return str(eval(expression))
def rewrite(expression: str) -> str:
return str(expression)
def simple_tool_call(client: OpenAI):
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "use my functions to compute the results for the equations: 6+1",
},
],
},
]
tools = [
{
"type": "function",
"function": {
"name": "my_calculator",
"description": "A calculator that can evaluate a mathematical equation and compute its results.",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "The mathematical expression to evaluate.",
},
},
"required": ["expression"],
},
},
},
{
"type": "function",
"function": {
"name": "rewrite",
"description": "Rewrite a given text for improved clarity",
"parameters": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "The input text to rewrite",
}
},
},
},
},
]
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=1.0,
max_tokens=1024,
tools=tools,
tool_choice="auto",
)
tool_calls = response.choices[0].message.tool_calls
results = []
for tool_call in tool_calls:
function_name = tool_call.function.name
function_args = tool_call.function.arguments
if function_name == "my_calculator":
result = my_calculator(**json.loads(function_args))
results.append(result)
messages.append({"role": "assistant", "tool_calls": tool_calls})
for tool_call, result in zip(tool_calls, results):
messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"name": tool_call.function.name,
"content": result,
}
)
response = client.chat.completions.create(
model=model_name,
messages=messages,
temperature=1.0,
max_tokens=1024,
)
print(response.choices[0].message.content)
if __name__ == "__main__":
simple_tool_call(client)
```
---
## 6. License
Both the code repository and the model weights are released under the [Modified MIT License](LICENSE).