Update README.md
Browse files
README.md
CHANGED
|
@@ -11,7 +11,7 @@ This project is created using the official **Deepseek R1** model script (`modeli
|
|
| 11 |
The three hidden layers consist of:
|
| 12 |
- **A hidden layer: MLA + Dense MLP**
|
| 13 |
- **A hidden layer: MLA + MoE (Mixture of Experts) MLP**
|
| 14 |
-
- **A MTP (Multi-Token Pretraining) layer (MTP can be regarded or used for speculative decoding in inference)
|
| 15 |
|
| 16 |
## Purpose
|
| 17 |
The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
|
|
|
|
| 11 |
The three hidden layers consist of:
|
| 12 |
- **A hidden layer: MLA + Dense MLP**
|
| 13 |
- **A hidden layer: MLA + MoE (Mixture of Experts) MLP**
|
| 14 |
+
- **A MTP (Multi-Token Pretraining) layer (MTP can be regarded or used for speculative decoding in inference)**
|
| 15 |
|
| 16 |
## Purpose
|
| 17 |
The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
|