| | --- |
| | license: mit |
| | language: |
| | - en |
| | base_model: |
| | - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
| | tags: |
| | - cot |
| | - r1 |
| | - deepseek |
| | - text |
| | --- |
| | # Model Card for DeepSeek-R1-Distill-Qwen-1.5B-4bit |
| |
|
| | <!-- Provide a quick summary of what the model is/does. --> |
| |
|
| | This is a 4-bit quantized version of the `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` model, optimized for efficient inference with reduced memory usage. The quantization was performed using the `bitsandbytes` library. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | - **Model type:** Transformer-based Language Model |
| | - **Language(s) (NLP):** English |
| | - **License:** MIT |
| | - **Finetuned from model:** `deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` |
| |
|
| |
|
| | ### Direct Use |
| |
|
| | This model is intended for research and practical applications where memory efficiency is critical. It can be used for: |
| |
|
| | - Text generation |
| | - Language understanding tasks |
| | - Chatbots and conversational AI |
| |
|
| | ### Downstream Use |
| |
|
| | This model can be fine-tuned for specific tasks such as: |
| |
|
| | - Sentiment analysis |
| | - Text classification |
| | - Summarization |
| |
|
| | ### Out-of-Scope Use |
| |
|
| | This model is not suitable for: |
| |
|
| | - High-precision tasks requiring full 16-bit or 32-bit precision |
| | - Applications requiring extremely low latency |
| |
|
| | ## Bias, Risks, and Limitations |
| |
|
| | The model may inherit biases present in the training data. Users should be cautious when deploying the model in sensitive applications. |
| |
|
| | ### Recommendations |
| |
|
| | Users should evaluate the model's performance on their specific tasks and datasets before deployment. Consider fine-tuning the model for better alignment with your use case. |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | Use the code below to get started with the model: |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
| | import torch |
| | |
| | # Quantization configuration |
| | quantization_config = BitsAndBytesConfig( |
| | load_in_4bit=True, |
| | bnb_4bit_quant_type="nf4", |
| | bnb_4bit_compute_dtype=torch.bfloat16, |
| | bnb_4bit_use_double_quant=True |
| | ) |
| | |
| | # Load the model and tokenizer |
| | tokenizer = AutoTokenizer.from_pretrained("emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit", trust_remote_code=True) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | "emredeveloper/DeepSeek-R1-Distill-Qwen-1.5B-4bit", |
| | quantization_config=quantization_config, |
| | device_map="auto", |
| | trust_remote_code=True |
| | ) |
| | |
| | # Generate text |
| | input_text = "Hello, how are you?" |
| | inputs = tokenizer(input_text, return_tensors="pt").to(model.device) |
| | outputs = model.generate(**inputs, max_new_tokens=50) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |