README.md · constructai/DeepSeek-R1-Distill-Qwen-7B-4bit at main

DeepSeek-R1-Distill-Qwen-7B-4bit / README.md

Psycho Pechnoi

Update README.md

c4fae7f verified 8 days ago

1.73 kB

	---
	license: mit
	base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
	tags:
	- deepseek
	- r1
	- qwen
	- 4bit
	- bitsandbytes
	- reasoning
	language:
	- en
	- zh
	pipeline_tag: text-generation
	library_name: transformers
	---

	# DeepSeek-R1-Distill-Qwen-7B-4bit

	## Overview
	This repository contains a 4-bit quantized version of [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B).
	The model is distilled from the original DeepSeek-R1 and uses the Qwen-2.5-7B architecture. It is quantized using `bitsandbytes` (NF4) to run on GPUs with ~5.5GB - 6GB VRAM.

	## Model Highlights
	- Reasoning Capabilities: Distilled from DeepSeek-R1, providing superior logical and mathematical performance for its size.
	- Architecture: Based on Qwen2.5-7B.
	- Quantization: 4-bit NormalFloat (NF4) for optimized memory usage.

	## Usage

	Install Requirements:
	```bash
	pip install -U transformers -U bitsandbytes>=0.46.1
	```
	Use the model with transformers:
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "Pxsoone/DeepSeek-R1-Distill-Qwen-7B-4bit"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	device_map="auto",
	torch_dtype=torch.float16
	)

	prompt = "Solve this puzzle: If I have 3 apples and you take away 2, how many apples do you have?"
	messages = [
	{"role": "user", "content": prompt}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer([text], return_tensors="pt").to(model.device)

	outputs = model.generate(**inputs, max_new_tokens=1000)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))