| --- |
| license: mit |
| base_model: |
| - deepseek-ai/DeepSeek-R1 |
| --- |
| # Lightweight Deepseek R1 (3 Hidden Layers Version) |
|
|
| This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **3-layer version** of Deepseek R1 with randomly initialized weights. |
|
|
| ## Model Structure |
| The three hidden layers consist of: |
| - **A hidden layer: MLA + Dense MLP** |
| - **A hidden layer: MLA + MoE (Mixture of Experts) MLP** |
| - **A MTP (Multi-Token Pretraining) layer (MTP can be regarded or used for speculative decoding in inference)** |
|
|
| ## Purpose |
| The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly. |
|
|
| The original **Deepseek R1 model** requires an **8x H200 GPU setup** and runs on the **vLLM/SGLang framework**, making it difficult to deploy on standard hardware. |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoConfig, AutoModelForCausalLM |
| from transformers import AutoTokenizer |
| import torch |
| |
| model = AutoModelForCausalLM.from_pretrained('silence09/DeepSeek-R1-3layers', torch_dtype=torch.bfloat16).cuda() |
| tokenizer = AutoTokenizer.from_pretrained('silence09/DeepSeek-R1-3layers') |
| |
| prompt = "Who are u?" |
| messages = [] |
| messages.append({"role": "user", "content": prompt}) |
| prompt_tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) |
| generated_ids = model.generate(prompt_tokens, max_new_tokens=100, do_sample=False) |
| generated_ids = [ |
| output_ids[len(input_ids):] for input_ids, output_ids in zip(prompt_tokens, generated_ids) |
| ] |
| completion = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
| print(completion) |
| messages.append({"role": "assistant", "content": completion}) |
| |
| ``` |
|
|
| ## More Info |
| It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_r1_3layers.py) |