Create README.md

Browse files

Files changed (1) hide show

README.md +54 -0

README.md ADDED Viewed

	@@ -0,0 +1,54 @@

+---
+license: mit
+base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+tags:
+- deepseek
+- r1
+- qwen
+- 4bit
+- bitsandbytes
+- reasoning
+language:
+- en
+- zh
+- ru
+pipeline_tag: text-generation
+library_name: transformers
+---
+# DeepSeek-R1-Distill-Qwen-7B-4bit
+## Overview
+This repository contains a 4-bit quantized version of **[DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)**.
+The model is distilled from the original DeepSeek-R1 and uses the Qwen-2.5-7B architecture. It is quantized using `bitsandbytes` (NF4) to run on GPUs with ~5.5GB - 6GB VRAM.
+## Model Highlights
+- **Reasoning Capabilities:** Distilled from DeepSeek-R1, providing superior logical and mathematical performance for its size.
+- **Architecture:** Based on Qwen2.5-7B.
+- **Quantization:** 4-bit NormalFloat (NF4) for optimized memory usage.
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_id = "Pxsoone/DeepSeek-R1-Distill-Qwen-7B-4bit"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="auto",
+    torch_dtype=torch.float16
+)
+prompt = "Solve this puzzle: If I have 3 apples and you take away 2, how many apples do you have?"
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer([text], return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=1000)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))