emredeveloper
/

DeepSeek-R1-Distill-Qwen-1.5B-4bit

4-bit precision

Model card Files Files and versions

DeepSeek-R1-Distill-Qwen-1.5B-4bit / README.md

emredeveloper's picture

Update README.md

2390d7d verified about 1 year ago

|

history blame contribute delete

2.55 kB

	---
	license: mit
	language:
	- en
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
	tags:
	- cot
	- r1
	- deepseek
	- text
	---
	# Model Card for DeepSeek-R1-Distill-Qwen-1.5B-4bit

	<!-- Provide a quick summary of what the model is/does. -->

	This is a 4-bit quantized version of the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, optimized for efficient inference with reduced memory usage. The quantization was performed using the `bitsandbytes` library.

	## Model Details

	### Model Description

	- Model type: Transformer-based Language Model
	- Language(s) (NLP): English
	- License: MIT
	- Finetuned from model: `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B`


	### Direct Use

	This model is intended for research and practical applications where memory efficiency is critical. It can be used for:

	- Text generation
	- Language understanding tasks
	- Chatbots and conversational AI

	### Downstream Use

	This model can be fine-tuned for specific tasks such as:

	- Sentiment analysis
	- Text classification
	- Summarization

	### Out-of-Scope Use

	This model is not suitable for:

	- High-precision tasks requiring full 16-bit or 32-bit precision
	- Applications requiring extremely low latency

	## Bias, Risks, and Limitations

	The model may inherit biases present in the training data. Users should be cautious when deploying the model in sensitive applications.

	### Recommendations

	Users should evaluate the model's performance on their specific tasks and datasets before deployment. Consider fine-tuning the model for better alignment with your use case.

	## How to Get Started with the Model

	Use the code below to get started with the model:

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
	import torch

	# Quantization configuration
	quantization_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16,
	bnb_4bit_use_double_quant=True
	)

	# Load the model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit", trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	"emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit",
	quantization_config=quantization_config,
	device_map="auto",
	trust_remote_code=True
	)

	# Generate text
	input_text = "Hello, how are you?"
	inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=50)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))