| | --- |
| | base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
| | library_name: peft |
| | pipeline_tag: text-generation |
| | language: en |
| | tags: |
| | - deepseek |
| | - text-generation |
| | - conversational |
| | --- |
| | |
| | # DeepSeek Chatbot |
| |
|
| | This is a fine-tuned version of DeepSeek-R1-Distill-Qwen-1.5B, optimized for conversational AI applications. The model maintains the base model's capabilities while being tuned for improved dialogue interactions. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | - **Developed by:** Trinoid |
| | - **Model type:** Conversational Language Model |
| | - **Language(s):** English |
| | - **License:** Same as base model (DeepSeek-R1-Distill-Qwen-1.5B) |
| | - **Finetuned from model:** deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
| |
|
| | ## Uses |
| |
|
| | ### Direct Use |
| |
|
| | This model can be used for: |
| | - General conversation |
| | - Text generation |
| | - Question answering |
| | - Chat-based applications |
| |
|
| | Example usage: |
| | ```python |
| | from huggingface_hub import InferenceClient |
| | |
| | client = InferenceClient("Trinoid/Deepseek_Chatbot") |
| | |
| | messages = [ |
| | {"role": "system", "content": "You are a helpful assistant."}, |
| | {"role": "user", "content": "Hello, how are you?"} |
| | ] |
| | |
| | response = client.chat_completion( |
| | messages, |
| | max_tokens=512, |
| | temperature=0.7, |
| | top_p=0.95 |
| | ) |
| | ``` |
| |
|
| | ### Out-of-Scope Use |
| |
|
| | This model should not be used for: |
| | - Generation of harmful or malicious content |
| | - Spreading misinformation |
| | - Production of illegal content |
| | - Making critical decisions without human oversight |
| |
|
| | ## Training Details |
| |
|
| | ### Training Procedure |
| |
|
| | #### Training Hyperparameters |
| |
|
| | - **Training regime:** fp16 mixed precision |
| | - **Framework:** PEFT (Parameter-Efficient Fine-Tuning) |
| | - **PEFT Method:** LoRA |
| | - **Version:** PEFT 0.14.0 |
| |
|
| | ## Technical Specifications |
| |
|
| | ### Model Architecture and Objective |
| |
|
| | - Base architecture: DeepSeek-R1-Distill-Qwen-1.5B |
| | - Fine-tuning method: PEFT/LoRA |
| | - Primary objective: Conversational AI |
| |
|
| | ### Compute Infrastructure |
| |
|
| | #### Software |
| | - PEFT 0.14.0 |
| | - Transformers |
| | - Python 3.x |
| |
|
| | ## Model Card Contact |
| |
|
| | For questions or issues about this model, please open an issue in the model repository. |