File size: 6,430 Bytes
ef6f0b6 09a9e5b 16bbafc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
---
library_name: transformers
tags: []
---
# Model Card
## Model Description
This is a **preference-tuned** version of **DeepSeek-R1-Distill-Llama-8B**, optimized for **telecom-related queries**. The model has been fine-tuned to provide **concise and factual answers**, ensuring that it **does role-play as a customer service agent** while incorporating user-specific preferences.
- **Developed by:** Mohamed Abdulaziz
- **Model type:** Fine-tune-DeepSeek-R1-Distill-Llama-8B
- **Framework Used:** Unsloth for fine tuning and wandb for performance monitoring
- **Preference Tuning:** The model has been additionally fine-tuned using **Direct Preference Optimization (DPO)** to align with user preferences, enhancing response quality and relevance.
## Benefits of Preference Tuning with DPO
Direct Preference Optimization (DPO) improves model performance by directly optimizing for user-preferred responses without requiring reinforcement learning with human feedback (RLHF).
### **Advantages of DPO in this Model:**
- **Higher Quality Responses:** The model generates more accurate and contextually relevant answers by aligning with preferred outputs.
- **Better User Experience:** By fine-tuning responses based on preferred interactions, the model enhances satisfaction and usability.
- **Efficient Fine-Tuning:** DPO allows preference alignment with reduced computational overhead compared to RLHF, making it more scalable and adaptable.
- **Improved Robustness:** The model avoids inconsistencies by learning from direct user feedback rather than implicit reward functions.
## Uses
This model is designed for **customer support automation in the telecom industry**. It assists in:
- Answering common user queries about **5G, network issues, billing, and services**.
- Providing **concise and factually correct responses**.
- Reducing **workload on human support agents** by handling routine inquiries.
- **Adapting to user-specific preferences** to improve interaction quality.
### **Who can use this model?**
- **Telecom companies**: Automate customer service via chatbots.
- **Developers & researchers**: Fine-tune and adapt for different use cases.
- **Call centers**: Support agents in handling user requests efficiently.
### **Who might be affected?**
- **End-users** interacting with telecom chatbots.
- **Support agents** using AI-assisted tools.
- **Developers & data scientists** fine-tuning and deploying the model.
## How to Get Started with the Model
### **1️⃣ Import necessary libraries**
```python
import torch
from unsloth import FastLanguageModel
from transformers import AutoTokenizer
```
### **2️⃣ Define model path**
```python
model_path = "moo100/DeepSeek-R1-telecom-chatbot-v2"
```
### **3️⃣ Load the model and tokenizer**
```python
model, tokenizer = FastLanguageModel.from_pretrained(
model_path,
max_seq_length=1024,
dtype=None,
device_map="auto",
load_in_4bit=True,
trust_remote_code=True
)
```
### **4️⃣ Optimize model for fast inference with Unsloth**
```python
model = FastLanguageModel.for_inference(model)
```
### **5️⃣ Move model to GPU if available, otherwise use CPU**
```python
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
```
### **6️⃣ Define system instruction to guide model responses**
```python
system_instruction = """You are an AI assistant. Answer user questions concisely and factually.
Do NOT add extra details unless necessary."""
```
### **7️⃣ Define user input (Replace with any query)**
```python
user_input = "Hello"
```
### **8️⃣ Construct full prompt with instructions and user query**
```python
full_prompt = f"{system_instruction}\n\nUser: {user_input}\nAssistant:"
```
### **9️⃣ Tokenize input prompt**
```python
inputs = tokenizer(full_prompt, return_tensors="pt").to(device)
```
### **🔦 Generate model response with controlled stopping criteria**
```python
outputs = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=50,
do_sample=True,
temperature=0.5,
top_k=50,
eos_token_id=tokenizer.eos_token_id,
)
```
### **1️⃣1️ Decode and extract only the newly generated response**
```python
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)
```
### **1️⃣2️ Print the AI-generated response**
```python
print(response.split("User: ")[0])
```
## Training Details
### Training Data
A **preference dataset** was created using **Phi4**, where the **chosen column** was generated via evaluation prompts. This dataset ensures the model aligns closely with user preferences by fine-tuning responses accordingly.
### Training Procedure
- **Training Loss:** Decreased steadily from **0.6997** at step 10 to **0.0966** at step 360, indicating stable model convergence.
- **Rewards / Chosen:** Improved consistently, reaching **1.4318**, showcasing enhanced preference alignment.
- **Rewards / Rejected:** Decreased over time, confirming the model's ability to distinguish between optimal and suboptimal responses.
- **Logits / Chosen & Rejected:** Demonstrated increasing separation, ensuring stronger differentiation between desirable and undesirable outputs.
- **Gradient Norm & Learning Rate Schedule:** Maintained stability with linear decay, preventing sudden fluctuations and ensuring smooth convergence.
- **Preference Alignment:** Fine-tuned using **DPO**, incorporating user feedback to optimize responses effectively.
# Evaluation
## Methodology
The chatbot was evaluated using Meta-Llama-3.3-70B-Instruct, assessing relevance, correctness, and fluency of its responses.
## Results
Meta-Llama-3.3-70B-Instruct Evaluation:
- **Relevance: 10**
The chatbot's response is highly relevant to the user's greeting, as it acknowledges the greeting and opens the conversation for further discussion.
- **Correctness: 10**
The chatbot's response is grammatically correct and uses proper language, making it a correct and appropriate response to the user's greeting.
- **Fluency: 10**
The chatbot's response is fluent and natural-sounding, with a friendly and approachable tone that invites the user to engage in conversation.
Overall, the chatbot's response is a great example of a well-crafted greeting response that effectively sets the tone for a helpful and engaging conversation.
|