Psycho Pechnoi commited on
Commit
54c8299
·
verified ·
1 Parent(s): bc29eb9

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
4
+ tags:
5
+ - deepseek
6
+ - r1
7
+ - qwen
8
+ - 4bit
9
+ - bitsandbytes
10
+ - reasoning
11
+ language:
12
+ - en
13
+ - zh
14
+ - ru
15
+ pipeline_tag: text-generation
16
+ library_name: transformers
17
+ ---
18
+
19
+ # DeepSeek-R1-Distill-Qwen-7B-4bit
20
+
21
+ ## Overview
22
+ This repository contains a 4-bit quantized version of **[DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)**.
23
+ The model is distilled from the original DeepSeek-R1 and uses the Qwen-2.5-7B architecture. It is quantized using `bitsandbytes` (NF4) to run on GPUs with ~5.5GB - 6GB VRAM.
24
+
25
+ ## Model Highlights
26
+ - **Reasoning Capabilities:** Distilled from DeepSeek-R1, providing superior logical and mathematical performance for its size.
27
+ - **Architecture:** Based on Qwen2.5-7B.
28
+ - **Quantization:** 4-bit NormalFloat (NF4) for optimized memory usage.
29
+
30
+ ## Usage
31
+
32
+ ```python
33
+ from transformers import AutoModelForCausalLM, AutoTokenizer
34
+ import torch
35
+
36
+ model_id = "Pxsoone/DeepSeek-R1-Distill-Qwen-7B-4bit"
37
+
38
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
39
+ model = AutoModelForCausalLM.from_pretrained(
40
+ model_id,
41
+ device_map="auto",
42
+ torch_dtype=torch.float16
43
+ )
44
+
45
+ prompt = "Solve this puzzle: If I have 3 apples and you take away 2, how many apples do you have?"
46
+ messages = [
47
+ {"role": "user", "content": prompt}
48
+ ]
49
+
50
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
51
+ inputs = tokenizer([text], return_tensors="pt").to(model.device)
52
+
53
+ outputs = model.generate(**inputs, max_new_tokens=1000)
54
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))