|
|
--- |
|
|
library_name: transformers |
|
|
tags: [] |
|
|
--- |
|
|
|
|
|
# Model Card |
|
|
|
|
|
|
|
|
## Model Description |
|
|
|
|
|
This is a **preference-tuned** version of **DeepSeek-R1-Distill-Llama-8B**, optimized for **telecom-related queries**. The model has been fine-tuned to provide **concise and factual answers**, ensuring that it **does role-play as a customer service agent** while incorporating user-specific preferences. |
|
|
|
|
|
- **Developed by:** Mohamed Abdulaziz |
|
|
|
|
|
- **Model type:** Fine-tune-DeepSeek-R1-Distill-Llama-8B |
|
|
|
|
|
- **Framework Used:** Unsloth for fine tuning and wandb for performance monitoring |
|
|
|
|
|
- **Preference Tuning:** The model has been additionally fine-tuned using **Direct Preference Optimization (DPO)** to align with user preferences, enhancing response quality and relevance. |
|
|
|
|
|
## Benefits of Preference Tuning with DPO |
|
|
|
|
|
Direct Preference Optimization (DPO) improves model performance by directly optimizing for user-preferred responses without requiring reinforcement learning with human feedback (RLHF). |
|
|
|
|
|
### **Advantages of DPO in this Model:** |
|
|
|
|
|
- **Higher Quality Responses:** The model generates more accurate and contextually relevant answers by aligning with preferred outputs. |
|
|
- **Better User Experience:** By fine-tuning responses based on preferred interactions, the model enhances satisfaction and usability. |
|
|
- **Efficient Fine-Tuning:** DPO allows preference alignment with reduced computational overhead compared to RLHF, making it more scalable and adaptable. |
|
|
- **Improved Robustness:** The model avoids inconsistencies by learning from direct user feedback rather than implicit reward functions. |
|
|
|
|
|
## Uses |
|
|
|
|
|
This model is designed for **customer support automation in the telecom industry**. It assists in: |
|
|
|
|
|
- Answering common user queries about **5G, network issues, billing, and services**. |
|
|
- Providing **concise and factually correct responses**. |
|
|
- Reducing **workload on human support agents** by handling routine inquiries. |
|
|
- **Adapting to user-specific preferences** to improve interaction quality. |
|
|
|
|
|
### **Who can use this model?** |
|
|
|
|
|
- **Telecom companies**: Automate customer service via chatbots. |
|
|
- **Developers & researchers**: Fine-tune and adapt for different use cases. |
|
|
- **Call centers**: Support agents in handling user requests efficiently. |
|
|
|
|
|
### **Who might be affected?** |
|
|
|
|
|
- **End-users** interacting with telecom chatbots. |
|
|
- **Support agents** using AI-assisted tools. |
|
|
- **Developers & data scientists** fine-tuning and deploying the model. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
### **1️⃣ Import necessary libraries** |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from unsloth import FastLanguageModel |
|
|
from transformers import AutoTokenizer |
|
|
``` |
|
|
|
|
|
### **2️⃣ Define model path** |
|
|
|
|
|
```python |
|
|
model_path = "moo100/DeepSeek-R1-telecom-chatbot-v2" |
|
|
``` |
|
|
|
|
|
### **3️⃣ Load the model and tokenizer** |
|
|
|
|
|
```python |
|
|
model, tokenizer = FastLanguageModel.from_pretrained( |
|
|
model_path, |
|
|
max_seq_length=1024, |
|
|
dtype=None, |
|
|
device_map="auto", |
|
|
load_in_4bit=True, |
|
|
trust_remote_code=True |
|
|
) |
|
|
``` |
|
|
|
|
|
### **4️⃣ Optimize model for fast inference with Unsloth** |
|
|
|
|
|
```python |
|
|
model = FastLanguageModel.for_inference(model) |
|
|
``` |
|
|
|
|
|
### **5️⃣ Move model to GPU if available, otherwise use CPU** |
|
|
|
|
|
```python |
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
model.to(device) |
|
|
``` |
|
|
|
|
|
### **6️⃣ Define system instruction to guide model responses** |
|
|
|
|
|
```python |
|
|
system_instruction = """You are an AI assistant. Answer user questions concisely and factually. |
|
|
Do NOT add extra details unless necessary.""" |
|
|
``` |
|
|
|
|
|
### **7️⃣ Define user input (Replace with any query)** |
|
|
|
|
|
```python |
|
|
user_input = "Hello" |
|
|
``` |
|
|
|
|
|
### **8️⃣ Construct full prompt with instructions and user query** |
|
|
|
|
|
```python |
|
|
full_prompt = f"{system_instruction}\n\nUser: {user_input}\nAssistant:" |
|
|
``` |
|
|
|
|
|
### **9️⃣ Tokenize input prompt** |
|
|
|
|
|
```python |
|
|
inputs = tokenizer(full_prompt, return_tensors="pt").to(device) |
|
|
``` |
|
|
|
|
|
### **🔦 Generate model response with controlled stopping criteria** |
|
|
|
|
|
```python |
|
|
outputs = model.generate( |
|
|
input_ids=inputs.input_ids, |
|
|
attention_mask=inputs.attention_mask, |
|
|
max_new_tokens=50, |
|
|
do_sample=True, |
|
|
temperature=0.5, |
|
|
top_k=50, |
|
|
eos_token_id=tokenizer.eos_token_id, |
|
|
) |
|
|
``` |
|
|
|
|
|
### **1️⃣1️ Decode and extract only the newly generated response** |
|
|
|
|
|
```python |
|
|
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True) |
|
|
``` |
|
|
|
|
|
### **1️⃣2️ Print the AI-generated response** |
|
|
|
|
|
```python |
|
|
print(response.split("User: ")[0]) |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
A **preference dataset** was created using **Phi4**, where the **chosen column** was generated via evaluation prompts. This dataset ensures the model aligns closely with user preferences by fine-tuning responses accordingly. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
- **Training Loss:** Decreased steadily from **0.6997** at step 10 to **0.0966** at step 360, indicating stable model convergence. |
|
|
- **Rewards / Chosen:** Improved consistently, reaching **1.4318**, showcasing enhanced preference alignment. |
|
|
- **Rewards / Rejected:** Decreased over time, confirming the model's ability to distinguish between optimal and suboptimal responses. |
|
|
- **Logits / Chosen & Rejected:** Demonstrated increasing separation, ensuring stronger differentiation between desirable and undesirable outputs. |
|
|
- **Gradient Norm & Learning Rate Schedule:** Maintained stability with linear decay, preventing sudden fluctuations and ensuring smooth convergence. |
|
|
- **Preference Alignment:** Fine-tuned using **DPO**, incorporating user feedback to optimize responses effectively. |
|
|
|
|
|
|
|
|
|
|
|
# Evaluation |
|
|
|
|
|
## Methodology |
|
|
|
|
|
The chatbot was evaluated using Meta-Llama-3.3-70B-Instruct, assessing relevance, correctness, and fluency of its responses. |
|
|
|
|
|
## Results |
|
|
|
|
|
Meta-Llama-3.3-70B-Instruct Evaluation: |
|
|
|
|
|
- **Relevance: 10** |
|
|
The chatbot's response is highly relevant to the user's greeting, as it acknowledges the greeting and opens the conversation for further discussion. |
|
|
|
|
|
- **Correctness: 10** |
|
|
The chatbot's response is grammatically correct and uses proper language, making it a correct and appropriate response to the user's greeting. |
|
|
|
|
|
- **Fluency: 10** |
|
|
The chatbot's response is fluent and natural-sounding, with a friendly and approachable tone that invites the user to engage in conversation. |
|
|
|
|
|
Overall, the chatbot's response is a great example of a well-crafted greeting response that effectively sets the tone for a helpful and engaging conversation. |
|
|
|
|
|
|