--- license: mit base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B tags: - deepseek - r1 - qwen - 4bit - bitsandbytes - reasoning language: - en - zh pipeline_tag: text-generation library_name: transformers --- # DeepSeek-R1-Distill-Qwen-7B-4bit ## Overview This repository contains a 4-bit quantized version of **[DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)**. The model is distilled from the original DeepSeek-R1 and uses the Qwen-2.5-7B architecture. It is quantized using `bitsandbytes` (NF4) to run on GPUs with ~5.5GB - 6GB VRAM. ## Model Highlights - **Reasoning Capabilities:** Distilled from DeepSeek-R1, providing superior logical and mathematical performance for its size. - **Architecture:** Based on Qwen2.5-7B. - **Quantization:** 4-bit NormalFloat (NF4) for optimized memory usage. ## Usage **Install Requirements:** ```bash pip install -U transformers -U bitsandbytes>=0.46.1 ``` **Use the model with transformers:** ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "Pxsoone/DeepSeek-R1-Distill-Qwen-7B-4bit" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype=torch.float16 ) prompt = "Solve this puzzle: If I have 3 apples and you take away 2, how many apples do you have?" messages = [ {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer([text], return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=1000) print(tokenizer.decode(outputs[0], skip_special_tokens=True))