README.md · constructai/DeepSeek-R1-Distill-Qwen-7B-4bit at main

File size: 1,727 Bytes

---
license: mit
base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
tags:
- deepseek
- r1
- qwen
- 4bit
- bitsandbytes
- reasoning
language:
- en
- zh
pipeline_tag: text-generation
library_name: transformers
---

# DeepSeek-R1-Distill-Qwen-7B-4bit

## Overview
This repository contains a 4-bit quantized version of **[DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B)**. 
The model is distilled from the original DeepSeek-R1 and uses the Qwen-2.5-7B architecture. It is quantized using `bitsandbytes` (NF4) to run on GPUs with ~5.5GB - 6GB VRAM.

## Model Highlights
- **Reasoning Capabilities:** Distilled from DeepSeek-R1, providing superior logical and mathematical performance for its size.
- **Architecture:** Based on Qwen2.5-7B.
- **Quantization:** 4-bit NormalFloat (NF4) for optimized memory usage.

## Usage

**Install Requirements:**
```bash
pip install -U transformers -U bitsandbytes>=0.46.1
```
**Use the model with transformers:**
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Pxsoone/DeepSeek-R1-Distill-Qwen-7B-4bit"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16
)

prompt = "Solve this puzzle: If I have 3 apples and you take away 2, how many apples do you have?"
messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=1000)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))