Update README.md
Browse files
README.md
CHANGED
|
@@ -1,12 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Lightweight Deepseek R1 (3 Hidden Layers Version)
|
| 2 |
|
| 3 |
This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **3-layer version** of Deepseek R1 with randomly initialized weights.
|
| 4 |
|
| 5 |
## Model Structure
|
| 6 |
The three hidden layers consist of:
|
| 7 |
-
- **A hidden layer
|
| 8 |
-
- **A hidden layer
|
| 9 |
-
- **A MTP (Multi-Token Pretraining) layer**
|
| 10 |
|
| 11 |
## Purpose
|
| 12 |
The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
|
|
@@ -38,5 +43,4 @@ messages.append({"role": "assistant", "content": completion})
|
|
| 38 |
```
|
| 39 |
|
| 40 |
## More Info
|
| 41 |
-
It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_r1_3layers.py)
|
| 42 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
base_model:
|
| 4 |
+
- deepseek-ai/DeepSeek-R1
|
| 5 |
+
---
|
| 6 |
# Lightweight Deepseek R1 (3 Hidden Layers Version)
|
| 7 |
|
| 8 |
This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **3-layer version** of Deepseek R1 with randomly initialized weights.
|
| 9 |
|
| 10 |
## Model Structure
|
| 11 |
The three hidden layers consist of:
|
| 12 |
+
- **A hidden layer: MLA + Dense MLP**
|
| 13 |
+
- **A hidden layer: MLA + MoE (Mixture of Experts) MLP**
|
| 14 |
+
- **A MTP (Multi-Token Pretraining) layer (MTP can be regarded or used for speculative decoding in inference) **
|
| 15 |
|
| 16 |
## Purpose
|
| 17 |
The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.
|
|
|
|
| 43 |
```
|
| 44 |
|
| 45 |
## More Info
|
| 46 |
+
It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_r1_3layers.py)
|
|
|