lccurious commited on
Commit
e43dd0e
·
verified ·
1 Parent(s): b75e044

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -8,7 +8,7 @@ tags:
8
  - text_generation
9
  ---
10
  DA2.0-flash-preview
11
- **LLaDA2-flash-preview** is a diffusion language model featuring a 100BA6B Mixture-of-Experts (MoE) architecture. As an enhanced, instruction-tuned iteration of the LLaDA series, it is optimized for practical applications.
12
 
13
  <div align="center">
14
  <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*kLORSaRfSK8AAAAAgIAAAAgAemJ7AQ/original" width="800" />
@@ -47,7 +47,7 @@ DA2.0-flash-preview
47
  + **Leading MoE Architecture**:
48
  The open-source **Mixture-of-Experts (MoE) diffusion large language model**, pre-trained from scratch on approximately **20 trillion tokens**.
49
  + **Efficient Inference**:
50
- With **100 billion total parameters**, only **6.1 billion** are activated during inference. LLaDA-flash-preview significantly reduces computational costs while outperforming open-source dense models of similar scale.
51
  + **Impressive Performance on Code & Complex Reasoning**:
52
  Excels in tasks such as **code generation** and **advanced mathematical reasoning**, demonstrating strong reasoning capabilities.
53
  + **Tool Use**:
 
8
  - text_generation
9
  ---
10
  DA2.0-flash-preview
11
+ **LLaDA2-flash-preview** is a diffusion language model featuring a 100BA6B Mixture-of-Experts (MoE) architecture. As an enhanced, instruction-tuned iteration of the LLaDA2.0 series, it is optimized for practical applications.
12
 
13
  <div align="center">
14
  <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*kLORSaRfSK8AAAAAgIAAAAgAemJ7AQ/original" width="800" />
 
47
  + **Leading MoE Architecture**:
48
  The open-source **Mixture-of-Experts (MoE) diffusion large language model**, pre-trained from scratch on approximately **20 trillion tokens**.
49
  + **Efficient Inference**:
50
+ With **100 billion total parameters**, only **6.1 billion** are activated during inference. LLaDA2.0-flash-preview significantly reduces computational costs while outperforming open-source dense models of similar scale.
51
  + **Impressive Performance on Code & Complex Reasoning**:
52
  Excels in tasks such as **code generation** and **advanced mathematical reasoning**, demonstrating strong reasoning capabilities.
53
  + **Tool Use**: