moo100
/

DeepSeek-R1-telecom-chatbot-v2

Transformers

Safetensors

Model card Files Files and versions

xet

Community

moo100 commited on Feb 25, 2025

Commit

09a9e5b

verified ·

1 Parent(s): fa57495

Update README.md

Browse files

Files changed (1) hide show

README.md +176 -16

README.md CHANGED Viewed

@@ -1,17 +1,177 @@
----
-base_model: unsloth/deepseek-r1-distill-llama-8b-unsloth-bnb-4bit
-tags:
-- text-generation-inference
-- transformers
-- unsloth
-- llama
-- trl
-license: apache-2.0
-language:
-- en
----
-# Uploaded  model
-- **Developed by:** mohamed Abdulaziz

+## Model Description
+This is a **preference-tuned** version of **DeepSeek-R1-Distill-Llama-8B**, optimized for **telecom-related queries**. The model has been fine-tuned to provide **concise and factual answers**, ensuring that it **does role-play as a customer service agent** while incorporating user-specific preferences.
+- **Developed by:** Mohamed Abdulaziz
+- **Model type:** Fine-tune-DeepSeek-R1-Distill-Llama-8B
+- **Framework Used:** Unsloth for fine tuning and wandb for performance monitoring
+- **Preference Tuning:** The model has been additionally fine-tuned using **Direct Preference Optimization (DPO)** to align with user preferences, enhancing response quality and relevance.
+## Benefits of Preference Tuning with DPO
+Direct Preference Optimization (DPO) improves model performance by directly optimizing for user-preferred responses without requiring reinforcement learning with human feedback (RLHF).
+### **Advantages of DPO in this Model:**
+- **Higher Quality Responses:** The model generates more accurate and contextually relevant answers by aligning with preferred outputs.
+- **Better User Experience:** By fine-tuning responses based on preferred interactions, the model enhances satisfaction and usability.
+- **Efficient Fine-Tuning:** DPO allows preference alignment with reduced computational overhead compared to RLHF, making it more scalable and adaptable.
+- **Improved Robustness:** The model avoids inconsistencies by learning from direct user feedback rather than implicit reward functions.
+## Uses
+This model is designed for **customer support automation in the telecom industry**. It assists in:
+- Answering common user queries about **5G, network issues, billing, and services**.
+- Providing **concise and factually correct responses**.
+- Reducing **workload on human support agents** by handling routine inquiries.
+- **Adapting to user-specific preferences** to improve interaction quality.
+### **Who can use this model?**
+- **Telecom companies**: Automate customer service via chatbots.
+- **Developers & researchers**: Fine-tune and adapt for different use cases.
+- **Call centers**: Support agents in handling user requests efficiently.
+### **Who might be affected?**
+- **End-users** interacting with telecom chatbots.
+- **Support agents** using AI-assisted tools.
+- **Developers & data scientists** fine-tuning and deploying the model.
+## How to Get Started with the Model
+### **1️⃣ Import necessary libraries**
+```python
+import torch
+from unsloth import FastLanguageModel
+from transformers import AutoTokenizer
+```
+### **2️⃣ Define model path**
+```python
+model_path = "moo100/DeepSeek-R1-telecom-chatbot-v2"
+```
+### **3️⃣ Load the model and tokenizer**
+```python
+model, tokenizer = FastLanguageModel.from_pretrained(
+    model_path,
+    max_seq_length=1024,
+    dtype=None,
+    device_map="auto",
+    load_in_4bit=True,
+    trust_remote_code=True
+)
+```
+### **4️⃣ Optimize model for fast inference with Unsloth**
+```python
+model = FastLanguageModel.for_inference(model)
+```
+### **5️⃣ Move model to GPU if available, otherwise use CPU**
+```python
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model.to(device)
+```
+### **6️⃣ Define system instruction to guide model responses**
+```python
+system_instruction = """You are an AI assistant. Answer user questions concisely and factually.
+Do NOT add extra details unless necessary."""
+```
+### **7️⃣ Define user input (Replace with any query)**
+```python
+user_input = "Hello"
+```
+### **8️⃣ Construct full prompt with instructions and user query**
+```python
+full_prompt = f"{system_instruction}\n\nUser: {user_input}\nAssistant:"
+```
+### **9️⃣ Tokenize input prompt**
+```python
+inputs = tokenizer(full_prompt, return_tensors="pt").to(device)
+```
+### **🔦 Generate model response with controlled stopping criteria**
+```python
+outputs = model.generate(
+    input_ids=inputs.input_ids,
+    attention_mask=inputs.attention_mask,
+    max_new_tokens=50,
+    do_sample=True,
+    temperature=0.5,
+    top_k=50,
+    eos_token_id=tokenizer.eos_token_id,
+)
+```
+### **1️⃣1️ Decode and extract only the newly generated response**
+```python
+response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
+```
+### **1️⃣2️ Print the AI-generated response**
+```python
+print(response.split("User: ")[0])
+```
+## Training Details
+### Training Data
+A **preference dataset** was created using **Phi4**, where the **chosen column** was generated via evaluation prompts. This dataset ensures the model aligns closely with user preferences by fine-tuning responses accordingly.
+### Training Procedure
+- **Training Loss:** Decreased steadily from **0.6997** at step 10 to **0.0966** at step 360, indicating stable model convergence.
+- **Rewards / Chosen:** Improved consistently, reaching **1.4318**, showcasing enhanced preference alignment.
+- **Rewards / Rejected:** Decreased over time, confirming the model's ability to distinguish between optimal and suboptimal responses.
+- **Logits / Chosen & Rejected:** Demonstrated increasing separation, ensuring stronger differentiation between desirable and undesirable outputs.
+- **Gradient Norm & Learning Rate Schedule:** Maintained stability with linear decay, preventing sudden fluctuations and ensuring smooth convergence.
+- **Preference Alignment:** Fine-tuned using **DPO**, incorporating user feedback to optimize responses effectively.
+# Evaluation
+## Methodology
+The chatbot was evaluated using Meta-Llama-3.3-70B-Instruct, assessing relevance, correctness, and fluency of its responses.
+## Results
+Meta-Llama-3.3-70B-Instruct Evaluation:
+- **Relevance: 10**
+  The chatbot's response is highly relevant to the user's greeting, as it acknowledges the greeting and opens the conversation for further discussion.
+- **Correctness: 10**
+  The chatbot's response is grammatically correct and uses proper language, making it a correct and appropriate response to the user's greeting.
+- **Fluency: 10**
+  The chatbot's response is fluent and natural-sounding, with a friendly and approachable tone that invites the user to engage in conversation.
+Overall, the chatbot's response is a great example of a well-crafted greeting response that effectively sets the tone for a helpful and engaging conversation.