Godcat252
/

Besttop973

Model card Files Files and versions

Besttop973 / README.md

Godcat252's picture

Thank you first commit

8b0c619 verified about 1 month ago

|

history blame contribute delete

2.07 kB

	---
	license: mit
	base_model:
	- deepseek-ai/DeepSeek-R1
	---
	# Lightweight Deepseek R1 (3 Hidden Layers Version)

	This project is created using the official Deepseek R1 model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a 3-layer version of Deepseek R1 with randomly initialized weights.

	## Model Structure
	The three hidden layers consist of:
	- A hidden layer: MLA + Dense MLP
	- A hidden layer: MLA + MoE (Mixture of Experts) MLP
	- A MTP (Multi-Token Pretraining) layer (MTP can be regarded or used for speculative decoding in inference)

	## Purpose
	The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run experiments quickly.

	The original Deepseek R1 model requires an 8x H200 GPU setup and runs on the vLLM/SGLang framework, making it difficult to deploy on standard hardware.

	## Usage

	```python
	from transformers import AutoConfig, AutoModelForCausalLM
	from transformers import AutoTokenizer
	import torch

	model = AutoModelForCausalLM.from_pretrained('silence09/DeepSeek-R1-3layers', torch_dtype=torch.bfloat16).cuda()
	tokenizer = AutoTokenizer.from_pretrained('silence09/DeepSeek-R1-3layers')

	prompt = "Who are u?"
	messages = []
	messages.append({"role": "user", "content": prompt})
	prompt_tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
	generated_ids = model.generate(prompt_tokens, max_new_tokens=100, do_sample=False)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(prompt_tokens, generated_ids)
	]
	completion = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(completion)
	messages.append({"role": "assistant", "content": completion})

	```

	## More Info
	It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_r1_3layers.py)